From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4D75C32793 for ; Wed, 18 Jan 2023 18:21:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D0BF6B0072; Wed, 18 Jan 2023 13:21:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 359A76B0073; Wed, 18 Jan 2023 13:21:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D2E06B0075; Wed, 18 Jan 2023 13:21:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 09B8E6B0072 for ; Wed, 18 Jan 2023 13:21:54 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C9F381A045F for ; Wed, 18 Jan 2023 18:21:53 +0000 (UTC) X-FDA: 80368738506.26.615FBDE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id BC1DC4001E for ; Wed, 18 Jan 2023 18:21:50 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=g5ITUSiE; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674066110; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kkfAUPkyScTipaKcC/XWnnaZrRP9e8MsdaN4fyf8AEA=; b=dBD72OVgX3igz8Gd64UcOdwOrkWwztUKLvJFETCWQV4UQ7iH3PwEoelLQ9EPDWCv8Iy38o ZK1Xys4makjmms9Yl+qE48w48OmRVADFbXuDmfz/yBhJ+0mhvL5tHQJ/fkYwnpy9+cYqpO H8ES7Ca5+fuW6RPuEvnVbKrYd8osw7w= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=g5ITUSiE; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674066110; a=rsa-sha256; cv=none; b=FevvkQvxdAg8x58NWVzUk1h1RheLQtfZyDyuWC+8xiSF/jCrVuJc2mM/CQZcQJIFZdNFU7 O6ro4oG6L2DGkIzwVngVdX8SReK3TVSBWWZ8p+F9ViQl8fbkdpX2pOwp2u25P29viOUdD9 xSdPE4npCB9M1r8IHiwkEyBgQpsJzlw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674066110; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kkfAUPkyScTipaKcC/XWnnaZrRP9e8MsdaN4fyf8AEA=; b=g5ITUSiE7rLPS9qFi+9E51mztnZfXHc3FeHKIt/4NdwvIjxHw1BYSsF9O/jjUi+Uxk4Gq7 8AW5mmsCQt/zo9s7/SyNm5GXZMYF1q4DZcVRAUw45HVHRKUhF8flI86tZq9DW0cnnhYWO0 GSReK+iqXOXNZCp8jotLAdw8dMQlo0M= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-237-3xKpFJldNC-Me3dIVfLyiQ-1; Wed, 18 Jan 2023 13:21:46 -0500 X-MC-Unique: 3xKpFJldNC-Me3dIVfLyiQ-1 Received: by mail-wr1-f72.google.com with SMTP id j30-20020adfb31e000000b002be008be32cso2514235wrd.13 for ; Wed, 18 Jan 2023 10:21:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kkfAUPkyScTipaKcC/XWnnaZrRP9e8MsdaN4fyf8AEA=; b=Avat4g+7RTXp/1WUeqsB+K1qkgC8kCmdbBzGmw7nzq6eoRYrOFjZrMJuXrFxNDI7rp huARie6zvZR3fCGvDot9y9YrqbUMRzbNogjyC8ou9jJ7OFu/1mySspiJhegDN0Vz4keG ynw4vMxmTNDUx7jKYJWZ7VDtWBl6cqQimNqZb+s5XwiCQoRyzPs0gjLtYwtzs6NtPmXJ kpy7/lwZmGLMciU6wczMgM8OL0qVwEaMfE0H1uJbG+YXUq6hMSZo5s8wWgBx+0sVdnJh 7psQ39LUd2l4JGQs66+BdDy063K7U5jMqOnmso7DkFEoOn+ymhPU4wVuU7iSj4msgdZw dxew== X-Gm-Message-State: AFqh2kptYrq4DauLDLpoZX67w9pOvMBSphBracv/69akfSrySWr+rw6Q UkaHbEhRP4l57UFctBF8z0XXJbrUpnVaEgsY43t+YPzwXRiIa4qb7t0RGatdj6QpzqHGH2mD2BC gP0jrsqPJeXI= X-Received: by 2002:a7b:c4d0:0:b0:3d1:f6b3:2ce3 with SMTP id g16-20020a7bc4d0000000b003d1f6b32ce3mr7892213wmk.35.1674066105815; Wed, 18 Jan 2023 10:21:45 -0800 (PST) X-Google-Smtp-Source: AMrXdXsVMq3n/sWCHIZHI3DCzEo1tb18y6AtjBtsP27kjnzqUSTu2zjaZeGBExx2m/EzgjVws3Jj3Q== X-Received: by 2002:a7b:c4d0:0:b0:3d1:f6b3:2ce3 with SMTP id g16-20020a7bc4d0000000b003d1f6b32ce3mr7892185wmk.35.1674066105554; Wed, 18 Jan 2023 10:21:45 -0800 (PST) Received: from ?IPV6:2003:cb:c705:800:1a88:f98a:d223:c454? (p200300cbc70508001a88f98ad223c454.dip0.t-ipconnect.de. [2003:cb:c705:800:1a88:f98a:d223:c454]) by smtp.gmail.com with ESMTPSA id fm11-20020a05600c0c0b00b003c21ba7d7d6sm2585970wmb.44.2023.01.18.10.21.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Jan 2023 10:21:44 -0800 (PST) Message-ID: <941f0f8f-a2c2-0021-0773-6cfaa81aabd7@redhat.com> Date: Wed, 18 Jan 2023 19:21:43 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: James Houghton , Peter Xu Cc: Mike Kravetz , Muchun Song , David Rientjes , Axel Rasmussen , Mina Almasry , Zach O'Keefe , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <06423461-c543-56fe-cc63-cabda6871104@redhat.com> <6548b3b3-30c9-8f64-7d28-8a434e0a0b80@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: BC1DC4001E X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: eyeyajjq1h41hiseok5nbetdgq4rereo X-HE-Tag: 1674066110-138192 X-HE-Meta: U2FsdGVkX18nWiDXCpIBKtVGBX3K8sZQkbwlfkrEYGGKDOSFeY6kXN9fD2Qm4Vp3JFui0PgVvDvJdoEAxN4L+drRFIlpXCNMaM/c1t1qD7gYUkh9yTQZDzV2SUOvGaEceUSnA09ys34ZtKImInncX7fRtUd9KVJJ2eiliyc9LmxKkHIYvPEA06M7b63KXcAZIgd6egd5iAqf2LmS84m6zZoT23BLXsOylrSMAHQej5oX+ZBjzQeCrv4uqt8pknEQCSWzLqYQv9ybAIin6yxXzPRT08LvFYIUAO92yx8Ii2LZquBSoHwYWGD3EQIOSFMPYnb9pm/kcukZ4hAG24e8DCN97c8Y7LpdxQ7nnZy7rsv2k7Wy6eLyA2HWYdkhjX+CPok7TLCHMrhL86jdIhfgVXD7j6ogxUPvkfukdxAs9li1heZtvgUPoTZieEEJ1JIcfcqPhtjoC3nvJHI2tWMD2WJpCwpdvmPkZDX3TO25QvLuUHFyOKrQ52rZGnqrF1WXa4m0w50kZlAyf9M+McALod7P+AJL/VdbAyHbCg0wPxxqi+x6//B58I0tXwWi9v8/eLG3duTb+iL6jnq+T1HDnEg1TQxSee8haD5pfkZ6reN1NpQJpnAR5qn/vxsKeVF1xQRAYx/HMsHD+7dItsiupGsq+ujn9G17h+bw369XSwh/oElPw6E2jAzup3ObXLZS22jBNkwK1OC8HEUEoOVzFRJUul4d1oOTqF4lErGBnOrt/Psi2xlMaCERw1cpU+Fi6OZZ/t1dIp5KsPDb1ZBW6JdeSTTYNw+kA1P8Jhfh7QY/IYzaiivsOZKpwXiU5wKcOSaZOjzPjRAru0S3AhzrjcAJSEHSJcH+YOgdvd+jOCkLsylwWLK59tqB0X4B8O5y8rwYJ1GSkORJmmsKvPWt5S2Kv0eV7wBHL6yUTxMgOg/RYRKwHupfNkXhjlmZQYcie7z0Hi8I9bw6gqbewet KAcQIWyS Y+y7XvJWDaHY+GlEyx556X4kH996o2XVhGHjcThcVgBqGAR1u7d3oUGpW3lMH5WrNR29Iad7Ytm1DBQDk9T5UaSWY4vtr4Pk4ev3NIVY7nfqJjCF0LJRNLLs78wLabcvkrd1GEzB9O4Oj1bXeBhDWNOrh3y3e4Am0PkcM6hA24RzR8uaSiLprqOCjTtF8US+zj0Ob X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >>> Once the last piece is unmapped (or simpler: once the complete subtree of >>> page tables is gone), we decrement refcount+mapcount. Might require some >>> brain power to do this tracking, but I wouldn't call it impossible right >>> from the start. >>> >>> Would such a design violate other design aspects that are important? > > This is actually how mapcount was treated in HGM RFC v1 (though not > refcount); it is doable for both [2]. > > One caveat here: if a page is unmapped in small pieces, it is > difficult to know if the page is legitimately completely unmapped (we > would have to check all the PTEs in the page table). In RFC v1, I > sidestepped this caveat by saying that "page_mapcount() is incremented > if the hstate-level PTE is present". A single unmap on the whole > hugepage will clear the hstate-level PTE, thus decrementing the > mapcount. > > On a related note, there still exists an (albeit minor) API difference > vs. THPs: a piece of a page that is legitimately unmapped can still > have a positive page_mapcount(). > > Given that this approach allows us to retain the hugetlb vmemmap > optimization (and it wouldn't require a horrible amount of > complexity), I prefer this approach over the THP-like approach. If we can store (directly/indirectly) metadata in the highest pgtable that HGM-maps a hugetlb page, I guess what would be reasonable: * hugetlb page pointer * mapped size Whenever mapping/unmapping sub-parts, we'd have to update that information. Once "mapped size" dropped to 0, we know that the hugetlb page was completely unmapped and we can drop the refcount+mapcount, clear metadata (including hugetlb page pointer) [+ remove the page tables?]. Similarly, once "mapped size" corresponds to the hugetlb size, we can immediately spot that everything is mapped. Again, just a high-level idea. -- Thanks, David / dhildenb