From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ECF0C54E4A for ; Thu, 7 Mar 2024 21:38:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DC0F6B02C7; Thu, 7 Mar 2024 16:38:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 78B046B02C8; Thu, 7 Mar 2024 16:38:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 605286B02C9; Thu, 7 Mar 2024 16:38:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 54C536B02C7 for ; Thu, 7 Mar 2024 16:38:46 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0E3E681274 for ; Thu, 7 Mar 2024 21:38:46 +0000 (UTC) X-FDA: 81871557852.16.1FE897C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf13.hostedemail.com (Postfix) with ESMTP id B46B42000A for ; Thu, 7 Mar 2024 21:38:43 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UAoVz86P; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709847523; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MKPeNmHFHDRmOBvlpK9QGd0hKs7158+s7Zy1RDSJZmc=; b=3aDTIrI5q+OZhyUySrSm2Z+PqRhLi4Br4u/m+n7jZgbiydwaHijXiCheltAYh+2VrBbfZH TBJsNQKFpuR12MFULgVwlzAfEHZ6jvjirVZast0HvD0hRe6EsZdxEnUv5gTNwBbydCATzV pM6l5OM8hL08TQeffMXAgDAw8rdWo0g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709847523; a=rsa-sha256; cv=none; b=B1p4B3k9pmRTDJ8woVa/sYGmVtnrBd54BPYfznrdPyX8r3W6UOMC1U0mcRCI+i2DHA5fP4 sf94WQt4HkId7P5t6rInFD/BFZuj4o1JYHeWDZmsYHlBcj1iZ3538M45bS2z8rsB7BP9LD H/weeFk0K5yhcT9je4ZDUnPWGrOmTBc= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UAoVz86P; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1709847523; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=MKPeNmHFHDRmOBvlpK9QGd0hKs7158+s7Zy1RDSJZmc=; b=UAoVz86PDHEa2PPWnBquWqSuS3xSMxOZWdLI3uF1U/0zmDwVXjq67MmN+U0R4QitmR1DAb S1e7Yl1Zsk6daKxVRb5UKv6XwqANI84+9KB+nhU+QjfuTY1EeouS8x9m5x5s05xTSqe2Xj K3AlSFl8R7dIeO/4+sH8El+qJOIuKko= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-177-aIMfMGMbNuCBxFowUoZ5JQ-1; Thu, 07 Mar 2024 16:38:41 -0500 X-MC-Unique: aIMfMGMbNuCBxFowUoZ5JQ-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-33d51bb9353so58253f8f.2 for ; Thu, 07 Mar 2024 13:38:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709847520; x=1710452320; h=content-transfer-encoding:in-reply-to:organization:autocrypt:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MKPeNmHFHDRmOBvlpK9QGd0hKs7158+s7Zy1RDSJZmc=; b=lnKXw3tNCwgxKHwpAkE+4POsGzGRuwuE1TXtQBKOl2mlHW8+g8mw6IjuLQbAyVjH9j V8sZS2OxK4OMDGvEzTWHzLEXrqXOSdSg27nOdJVBY5Hxtz8FkmDWpNKjHeV/Umi03zZI trTnpN7TDDDc1SX4zwIcRAzHsMFyKhckbwR37VYoi/JdupthfCLJ0FEKD5P4EFUl0xpW I4DOgRb/XiQYfXjTZUdwwhLmFEbXG14LgGOtjsnYlDBrQbgYUjwfyO3unTPoK4sANOm0 Cgc6uPZ8pPoxBPmyCrD8FO8x+uMkyXpUdvMCfvMjbHQIu14xZvPrMaH+eFncLTQHm3Sp 5kRg== X-Gm-Message-State: AOJu0YwZWWokWOroumBjyCvg9GSDRMHQHjFmD2yO5cuE7TMTa+lcmq2c 9Nz58aDgw54tyGHB7PYXLyqB9nNRpR3d/q2O5xlRz4thJNMfIbK8j/Vmt7qPCIyhLCRyZtcKZfH oCp9gZ73d9oesF/Sje886VCL7uN4MkmFGIomnAr5/Hxys9xos X-Received: by 2002:a5d:4c4f:0:b0:33e:6366:5f33 with SMTP id n15-20020a5d4c4f000000b0033e63665f33mr1938902wrt.9.1709847520491; Thu, 07 Mar 2024 13:38:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IH6uhM6B1IliM78G0+zTQPOTMSwQ4cg48FMXM7wABl5LF/4kbLHZO3xiJOjCGcsTpZATTmDNQ== X-Received: by 2002:a5d:4c4f:0:b0:33e:6366:5f33 with SMTP id n15-20020a5d4c4f000000b0033e63665f33mr1938892wrt.9.1709847520045; Thu, 07 Mar 2024 13:38:40 -0800 (PST) Received: from ?IPV6:2003:cb:c74d:6400:4867:4ed0:9726:a0c9? (p200300cbc74d640048674ed09726a0c9.dip0.t-ipconnect.de. [2003:cb:c74d:6400:4867:4ed0:9726:a0c9]) by smtp.gmail.com with ESMTPSA id bt17-20020a056000081100b0033e3cb02cefsm12343623wrb.86.2024.03.07.13.38.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 07 Mar 2024 13:38:39 -0800 (PST) Message-ID: <4b9f83e5-c7b5-4a08-b5b0-411921e00b5e@redhat.com> Date: Thu, 7 Mar 2024 22:38:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/5] Remove some races around folio_test_hugetlb To: Matthew Wilcox Cc: linux-mm@kvack.org, Oscar Salvador References: <20240301214712.2853147-1-willy@infradead.org> <52599fd8-76dc-4d8f-b9f2-78146fc7a518@redhat.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B46B42000A X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: e5f5sqafq4md1cf3xrzn1x6oaxs5w1gn X-HE-Tag: 1709847523-168503 X-HE-Meta: U2FsdGVkX187zu1zZtKFfNLiWxYDz84qfJordgoq1lhbnE2o+fT/ogSci2i/eJ2tWuZJnL115LLtDvBLq9+jVnOdqBQzZ90B+fynMwzwhPVlE8zPDmSkbcC6Hux2qAn0oN2F5FhFZ16pFvWgUle6tdVAS/m5dFwdidvGU3xCDoApjcPJOZTHmtp3csIpba2YpQHQ5+GWWjIjWR1CAr0AeCOhRnB2QyryeKKeVoSVMKP/GiPjBno+lB/qkq7/Q/XXxWRgxDiGwe0ftF3EP19vZSfxRH5KnU3pLH0PWiWsxKl4LIIggLd6pE1snZXDnYMrIUHva0a2rgc3UXYhkd7tLLPmQRYJR5vAR3I3UX0KHywsvRc7WRUd29rHnEWhfYaM2XyO6CKuKENGeFsj+4zqdDiXoXcVp7HLHv7YCFwTlLhD8TFiHmtdrH+CvYsTFPPhT0SbFKyiRDopyB6FXwkp/KY38k9spLxVpcWzjkQCnJlM9uN/hh3nlNoTZlECA/FJDae4tZX52v/BjFe/kgdXysLvSyQoAMai9A3CNzEAzOwKFln4XU8nMNWG8GaTQYaul722TJcquU00dIxw7EwEX0iD8xoakMP+1yp9tYXqqhcKBlAjFhi61YbXW6yb90jXoTgBrPgnTxiLCxPk22IV5PYND/jGIF9unnAGcVNqGcqrXMqLbwtF924rFV/2gORnuTjrDYU/0tpwxjjW2V3kJE4onzCc+F58vKOqyVaFfPfkAieLWb7h2ywidSVbTP7EP9tp5HaGlaqo5xgMZ6yYeiSsMrUtIBLEYzrK/2b1KIWKcxffBeTSR16yBHLAhpg90qy6jTiQ+sByWYJob3ZMePBSiydeHV4EA++7P3jkl3lgVYvU3TRZ37ADo0MoSXJTa/htj/JtoGw0hUNh6f+c1ReJ22kQi1eoa09GPj/7j1QWiLPaymqz01q65V3dhkD31mr4Dr8PzXWcNdQFWbf XWbWrh0w 71i/kpRN//a3nsWIjh6JGsU+TEQFqYT6sHE6O0WHzNS3VTyIOge2rP0fPRv9hnBybNg2LFLBxMwg9mfZ2tC+aYD4twLKlCZQXFx13USVdMCrzbaPmGh5ZCEYqBi6L5axgL/8l1DPWUZXstFVhypUpadIN8x38Nb2A+GgqDz/0m2yEWhanCOQJK/DvR4+eIsRp6f8PsFBPOrpoRBKUBr3WYQ3SREb4C50KWydnlWxGyhxp5W+AY8LiDdsTGX9i0nTPqiAPktafhaLKIgklhrfoy9C82XiGwlbN3Qg0IjsLzCgd3KQoeIb+QPXw98sl7PVWvaH8NehfhQh34qUJkdFbnIM+V0DvYw8udYS8Y9VfzWy0L2Qa7TOsjZ0QHyjQ5sqPR+wDXIdp1UFuBHHLXD/QVzReJlMVln+qkA+gAr6tyxVkUrjWjwscCIZMjKm38dhVKaoqxVhjJplogZHEFKMsfTHUKg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.009652, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07.03.24 22:14, Matthew Wilcox wrote: > On Thu, Mar 07, 2024 at 10:20:15AM +0100, David Hildenbrand wrote: >>>>> IOW: >>>>> >>>>> word page0 page1 >>>>> 0 flags flags >>>>> 1 lru.next head >>>>> 2 lru.prev entire_mapcount + gap >>>>> 3 mapping nr_pages_mapped + gap / hugetlb_id >>>>> 4 index pincount + nr_pages >>>>> 5 private unused >>>>> 6 mapcount+refcount mapcount+refcount(0) >>>>> 7 memcg_data - >>>>> >>>>> or on 32-bit >>>>> >>>>> word page0 page1 >>>>> 0 flags flags >>>>> 1 lru.next head >>>>> 2 lru.prev entire_mapcount >>>>> 3 mapping nr_pages_mapped / hugetlb_id >>>>> 4 index pincount >>>>> 5 private unused >>>>> 6 mapcount mapcount >>>>> 7 refcount refcount >>>>> 8 memcg_data - >>>>> 9+ virtual? last_cpupid? whatever >>> >>> How about this layout? >>> >>> @@ -350,8 +350,13 @@ struct folio { >>> unsigned long _head_1; >>> unsigned long _folio_avail; >>> /* public: */ >>> - atomic_t _entire_mapcount; >>> - atomic_t _nr_pages_mapped; >>> + union { >>> + unsigned long _hugetlb_id; >>> + struct { >>> + atomic_t _entire_mapcount; >>> + atomic_t _nr_pages_mapped; >>> + }; >>> + }; >>> atomic_t _pincount; >>> #ifdef CONFIG_64BIT >>> unsigned int _folio_nr_pages; >>> >>> That keeps _folio_avail as, well, available. It puts _hugetlb_id in >>> the same bits as ->mapping. It continues to leave ->private unused >>> on 64-bit. I think this does everything we want? >> >> _entire_mapcount is (still) used for hugetlb folios. > > Oh, duh, of course it is. I thought we used page[0].mapcount for them, > but we don't and shouldn't. I suppose we could use a magic value for > page[0].mapcount to indicate hugetlb, but that'd make page_mapcount() > more complex. > >> With the total mapcount in place, I was thinking about renaming it to >> "_pmd_mapcount" and stop using it for hugetlb folios, just like we'd not be >> using _nr_pages_mapped for hugetlb folios. >> >> [I also thought about moving the _pmd_mapcount to another subpage, where >> we'd also have a _pud_mapcount in the future; but again, stuff for the >> future] >> >> Until then, wouldn't _hugetlb_id be problematic here? [I could move >> _entire_mapcount/_pmd_mapcount later I guess] > > New idea then, how about simply: > > unsigned long _flags_1; > unsigned long _head_1; > - unsigned long _folio_avail; > + unsigned long _hugetlb_id; > > We have to check the various other users of struct page to see what we > might conflict with. Well, compared to the current version there is no change for me: the space I wanted to use for the total_mapcount is gone and I'll have to start being creative :) Free space in subpage1: "The LORD gave and the LORD has taken away" We seem to be running again into space issues in subpage1 (also, more relevant now that we will support order-1 folios ...). This kind of wastage+over-complicated layout (again :( ) just to handle hugetlb oddities for free pages -- refcount 0 -- feels very wrong. Stupid idea: Do we really have to identify (possibly free) hugetlb folios that way from lockless pfnwalkers without a folio reference? I mean, with a folio reference held it's all working as expected. Couldn't we just make hugetlb store them in some kind of xarray that we can walk (using RCU?), indexed by start PFN we want to test? So if we find the current hugetlb flag to be set on the lockless path, simply check that xarray. It's all super racy either way, we can get a free+split+reuse anytime. Or is that completely flawed? -- Cheers, David / dhildenb