From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9643DC71130 for ; Tue, 8 Jul 2025 04:30:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AE5B6B03A9; Tue, 8 Jul 2025 00:30:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1601B6B03AA; Tue, 8 Jul 2025 00:30:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04E306B03AB; Tue, 8 Jul 2025 00:30:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E80676B03A9 for ; Tue, 8 Jul 2025 00:30:14 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6A86780477 for ; Tue, 8 Jul 2025 04:30:14 +0000 (UTC) X-FDA: 83639820348.12.5601133 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 173011C0004 for ; Tue, 8 Jul 2025 04:30:11 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XLVP9fsa; spf=pass (imf18.hostedemail.com: domain of mpenttil@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mpenttil@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751949012; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=poK+MyjdqtyS2NtGfpOWZJkbO8wjy+dgyWR6HOHODAo=; b=chqdzscdkBbhPREVIkbXZFk9yWVYBgQAgbsxO1akb8dzz7rJl45sRrUsXOb0PLuOzWmgjr 9ZFV2Jozd0O3TnFWEeQKn41yXOqzelhBmt+jpMI0nqNBMSwJ42uhWRuOdc1LmHFUJcGMgJ RJa+jY8I3XWGxucyTFJWJHXdZjmCY3Y= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XLVP9fsa; spf=pass (imf18.hostedemail.com: domain of mpenttil@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mpenttil@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751949012; a=rsa-sha256; cv=none; b=ipT7G68B1fDuPQOg0jQN/zHSN8Knb3zKwdtvCi7ATTcQIMsJYSSdP+xOkWXiOpzhvZ0uXN wGyT8pZNlYCpoV20mVdplSCc3geej6u2TQD5HjGpNM19XZqXYxMxFzZCx1YS1HasFyAm2J e2cvgww/8vM1WlbMeieK8VMT84NEfcA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751949011; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=poK+MyjdqtyS2NtGfpOWZJkbO8wjy+dgyWR6HOHODAo=; b=XLVP9fsaZ0M6lMrN8JcHA5zlFRhRMPbE9v2xi8wCpq9Zemg9Zb7rOMdMjTW29OvXWfeKSd Qk7CnIHQXara244GdrKZqZtVes5n+OvwB6bVC4jUz9OTB1ToIo7/JAAVrDar5QNZPvRJ6f iXcWG13Cz10JQOntC49jaLuBt4zWzdM= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-605-KjWQMLGyMn-5mXJseuOEyw-1; Tue, 08 Jul 2025 00:30:09 -0400 X-MC-Unique: KjWQMLGyMn-5mXJseuOEyw-1 X-Mimecast-MFC-AGG-ID: KjWQMLGyMn-5mXJseuOEyw_1751949008 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-557e35aad50so1989830e87.0 for ; Mon, 07 Jul 2025 21:30:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751949008; x=1752553808; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=poK+MyjdqtyS2NtGfpOWZJkbO8wjy+dgyWR6HOHODAo=; b=FFnSICa8DFfv85N0D0risjopaBbvE76L8TUB18av7jdxT7rQYDA6PQ85aiOnUOFlmK GDD9zkolzZR8CZpHhIHQ2aIIOjbh6g6RUc+Kd903/Fi5L8FlYMKhSyQz/SyIQvOqbj9r mo+xEEURbihEpQtxaDa68XRopo2gniPOLh7P6mEjqHX3W6PMZ9OW7JMtVW0y5kWkzuam dt4JbvGmkSSqfmQbmO2a0kNKBYATtieComRfy7nunaJ0ezwoWJDU9HwlAt6wW/SV5Cph wogdS7gS3smPACDXRfu73Ip2qQbZnDQ0M+EriK8y3svS+kTqhqXjTkr0u//UFbJGNa8f eJTA== X-Forwarded-Encrypted: i=1; AJvYcCUhgQUIqbpw6ysQS6rFKPfW3uSP1GB0BGH3K2xx9ZXj2VQFVrbXHCM8Il1S2ZUIhzvHv0q3dBO1dg==@kvack.org X-Gm-Message-State: AOJu0YyGgVQ0uUlPWwiAlxPUzULOxu3P13o+5XBzb+NgTLFBh2XAgyFR TmOG7VlJQxUgIxqA+7Rga8TfrXAc5E3FZUvvINowBgXWjnA7K80JgaMNw3HgvKIYXRE3d3ypk4f jpqnLHH2+7BJjkZFkbG1J5kWUHLg/ATQvHcHliAxWhI2XxaYyW40= X-Gm-Gg: ASbGncvrYABAaNpWOjFQ72UR2Me8fAY4wqEip8LxqrXmIOw8/wJSrHtrOXfbXHijfNW in77JCZAWtaJ5AAd8o6A/JXBvervrxu/sI3IfhBJh6wv+NAFyQ6JLtY7zYxk/fqH6XuNQScjPZE VHJ4fxeEOq7Zln/WFcUzdcdBNhxoAnJhIwatMGYvDanCuGI1jcofqPV9AIOZnZv1CztFn1tBJl3 3F8M1OBVy5Q8hyvtrW+GHkJoHL8TTBYauCWZaZIeZuVA6Dh3w8Xb/6OasnHMZx8JeybQwOWOWdr 9sxDAfcueRfvZAl+iHIsaYwfdWzo/g2K3oNbmVs5ZpUVD7jU X-Received: by 2002:a05:6512:138b:b0:554:f9cc:bea5 with SMTP id 2adb3069b0e04-557aa2930d1mr4593640e87.34.1751949008105; Mon, 07 Jul 2025 21:30:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHQIVBjkvYjiYeD1myg2B1HudysYqJQWf4weYmho7e0bDcMupyLOoOVufOLQBsczX8y80PeKw== X-Received: by 2002:a05:6512:138b:b0:554:f9cc:bea5 with SMTP id 2adb3069b0e04-557aa2930d1mr4593621e87.34.1751949007579; Mon, 07 Jul 2025 21:30:07 -0700 (PDT) Received: from [192.168.1.86] (85-23-48-6.bb.dnainternet.fi. [85.23.48.6]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-55638494bb0sm1536665e87.104.2025.07.07.21.30.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Jul 2025 21:30:06 -0700 (PDT) Message-ID: Date: Tue, 8 Jul 2025 07:30:06 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [v1 resend 03/12] mm/thp: zone_device awareness in THP handling code To: Balbir Singh , linux-mm@kvack.org Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Shuah Khan , David Hildenbrand , Barry Song , Baolin Wang , Ryan Roberts , Matthew Wilcox , Peter Xu , Zi Yan , Kefeng Wang , Jane Chu , Alistair Popple , Donet Tom References: <20250703233511.2028395-1-balbirs@nvidia.com> <20250703233511.2028395-4-balbirs@nvidia.com> From: =?UTF-8?Q?Mika_Penttil=C3=A4?= In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: VuLSmCMJZmDgPuhZpe8CGx0WOgreCxMe92xfRtI3qTU_1751949008 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: m5wwtft9n8xfh4sesxfy8e1ws5urpqhh X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 173011C0004 X-HE-Tag: 1751949011-237402 X-HE-Meta: U2FsdGVkX18Lqm+ZU+eIh0V/p+tMphoDkQkdEXmXFa/RzzhpdBW66QvAXF5wUnz+PSoK9tQUtY9/67v9zPGcD564xLp6ngwrWCLFoh9ReMl+wJmmbFmfAGZ3IQo8MPxemRdl36NxbZ9DfVvTiDdvI/qLCuezV1YURGo5ykON0qkfPY2XKCy0zhINErBluxIHYHOOhbZNhBo9DNeDEfdEH6S+go6mymweQKm1Zcnh9V4dI3LBB8K7CVAcdpzSV8BhKMo3sqVYAiv81YAjKxVbfvAo6JCvTWmCMm6fu55Vz8Qmi+b/NiB4winLJHAYOMPuXelgvWAPm10zTdaxVzM7f0S5csb8mAYZQgY6tPHdDe1eRzRBP2ywizRHxwxhqUNhgY5WZew2j+VaxyJ9m3b8TlCltqwwXBdpU5buD/26NG5kZ92IDpXKvW/3j3ODiyGJLNy1JMnbB92cSiukMKe/ji2NyIC5ucV37CO/HqBR5fA60jeN84MDRF0OaGpMeB7yWwNgdi35RLl58+TCDm5/cBJTABoovoIJKG6AFoDxxNpH5A4za3TX/d/TWnOtG+TSTSuAhPmPifQsyM6sMTxRkz5MyGsTQ91qA2bcoL+KtxYKExAs3i/6QH0DKuxlEpPaaYvg3sFGv0RkvUJie7s2PRN92KqMctCROA/8TUt2miS4w2umb8hnRJik25Dk7dz5CN+wqFdcQ9DC+oTjZAz9fRd22se4MCaabtdxqCUeAeZz15NeBoyUJM5oVrFH2RY7b6f/fDf8dFV2lrZDcoTXg1YZ6vsA6WIyyVR7exqQniRiohwjLAanUYm328c5QUlNEndIdQUv1W7EZyLe00v9IQ6UZslKQah4x4l407Yuhe+p7hIkbl3PdF6rqEqAvhvwzcXaHUuObdX6/DjlOo40Pf8vUm3D5sxScbXzoG8zgGT6yJgqdvwhlYnm9B+k1+fFeTYh3kntrM1FRROAs53 +EhL8l1/ 6Fkd0OP0mfDerpW89FUXgWr32YzLh0fX2j5gDltqkqOO+3FMkkTYGhZ7PP4VkL5Ej+CTY4Wh3KKPPXu3ZEvDFf5CmW6w50+ntTk3/xlYu9g/TKmLVRknyeanj6OwKI4DbNNPu3wAmadpKmySkto08wQXGaHMXxo7+DLsj23PavwQKV4cW+OTyMUOCkzeF6NVnpUlhb2EP76g/ASUebt69sN4h4n+K8A6QOBpmMFN58tgtCuakVpmnHKN+z1WivhaRfAthX2gEVsQ7jxNNvaMTsTnC9Q1akIBdu1pqbRqKX/8aqqfoSGgPE7CLLRLxdFER212zgynwNW8KIy4iDgC4UP3Y9M+iBh5iADliXx7siVqvWmRKNUqA1drep/3k0L72oZ3gKj3zm36boW+ehVgp8Wfq745klm6tdB6MXn53885PGvyKnfMDeBqEXDISjaB0ChNmKN44M6Q36Tnivb7GXeVO1ZOOCG9rKG89riuOTCNOhDcEUu4eiNrndeMHPhsdKvnNontsi3+EsSj3FA6gMwmnllSja0TvCUShuqY0rNFE1p09Y7SXFwD7t9zMjvAfZvLIv+TVFOFSR8pmICVVfFnHFH/bwBXCcwBgJrLtdpdzedcDa7FPwyXIuWCPMZQ3Q7jeOLo934y0T6aIoFtGYLsXSqvqdkSsuRHqYKYv56liUsibWhJ4gcYkuwpe6EdBNauivI4GqJfQzdRwwAFFvuA4nCkngPHrGlVL/Y+WoRehr5k2F1O51I0e2m0dfgr/iPi3oPV4RfdB4RjivgTaoAIW5rXiSF2ChvIIsG3LxgmvQPGRJsR6zUL+KUO3ExRcZ3Pn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 7/8/25 07:20, Balbir Singh wrote: > On 7/7/25 13:49, Mika Penttilä wrote: >> On 7/4/25 02:35, Balbir Singh wrote: >>> Make THP handling code in the mm subsystem for THP pages >>> aware of zone device pages. Although the code is >>> designed to be generic when it comes to handling splitting >>> of pages, the code is designed to work for THP page sizes >>> corresponding to HPAGE_PMD_NR. >>> >>> Modify page_vma_mapped_walk() to return true when a zone >>> device huge entry is present, enabling try_to_migrate() >>> and other code migration paths to appropriately process the >>> entry >>> >>> pmd_pfn() does not work well with zone device entries, use >>> pfn_pmd_entry_to_swap() for checking and comparison as for >>> zone device entries. >>> >>> try_to_map_to_unused_zeropage() does not apply to zone device >>> entries, zone device entries are ignored in the call. >>> >>> Cc: Karol Herbst >>> Cc: Lyude Paul >>> Cc: Danilo Krummrich >>> Cc: David Airlie >>> Cc: Simona Vetter >>> Cc: "Jérôme Glisse" >>> Cc: Shuah Khan >>> Cc: David Hildenbrand >>> Cc: Barry Song >>> Cc: Baolin Wang >>> Cc: Ryan Roberts >>> Cc: Matthew Wilcox >>> Cc: Peter Xu >>> Cc: Zi Yan >>> Cc: Kefeng Wang >>> Cc: Jane Chu >>> Cc: Alistair Popple >>> Cc: Donet Tom >>> >>> Signed-off-by: Balbir Singh >>> --- >>> mm/huge_memory.c | 153 +++++++++++++++++++++++++++++++------------ >>> mm/migrate.c | 2 + >>> mm/page_vma_mapped.c | 10 +++ >>> mm/pgtable-generic.c | 6 ++ >>> mm/rmap.c | 19 +++++- >>> 5 files changed, 146 insertions(+), 44 deletions(-) >>> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index ce130225a8e5..e6e390d0308f 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -1711,7 +1711,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, >>> if (unlikely(is_swap_pmd(pmd))) { >>> swp_entry_t entry = pmd_to_swp_entry(pmd); >>> >>> - VM_BUG_ON(!is_pmd_migration_entry(pmd)); >>> + VM_BUG_ON(!is_pmd_migration_entry(pmd) && >>> + !is_device_private_entry(entry)); >>> if (!is_readable_migration_entry(entry)) { >>> entry = make_readable_migration_entry( >>> swp_offset(entry)); >>> @@ -2222,10 +2223,17 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, >>> } else if (thp_migration_supported()) { >>> swp_entry_t entry; >>> >>> - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); >>> entry = pmd_to_swp_entry(orig_pmd); >>> folio = pfn_swap_entry_folio(entry); >>> flush_needed = 0; >>> + >>> + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && >>> + !folio_is_device_private(folio)); >>> + >>> + if (folio_is_device_private(folio)) { >>> + folio_remove_rmap_pmd(folio, folio_page(folio, 0), vma); >>> + WARN_ON_ONCE(folio_mapcount(folio) < 0); >>> + } >>> } else >>> WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); >>> >>> @@ -2247,6 +2255,15 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, >>> folio_mark_accessed(folio); >>> } >>> >>> + /* >>> + * Do a folio put on zone device private pages after >>> + * changes to mm_counter, because the folio_put() will >>> + * clean folio->mapping and the folio_test_anon() check >>> + * will not be usable. >>> + */ >>> + if (folio_is_device_private(folio)) >>> + folio_put(folio); >>> + >>> spin_unlock(ptl); >>> if (flush_needed) >>> tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE); >>> @@ -2375,7 +2392,8 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, >>> struct folio *folio = pfn_swap_entry_folio(entry); >>> pmd_t newpmd; >>> >>> - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); >>> + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && >>> + !folio_is_device_private(folio)); >>> if (is_writable_migration_entry(entry)) { >>> /* >>> * A protection check is difficult so >>> @@ -2388,9 +2406,11 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, >>> newpmd = swp_entry_to_pmd(entry); >>> if (pmd_swp_soft_dirty(*pmd)) >>> newpmd = pmd_swp_mksoft_dirty(newpmd); >>> - } else { >>> + } else if (is_writable_device_private_entry(entry)) { >>> + newpmd = swp_entry_to_pmd(entry); >>> + entry = make_device_exclusive_entry(swp_offset(entry)); >>> + } else >>> newpmd = *pmd; >>> - } >>> >>> if (uffd_wp) >>> newpmd = pmd_swp_mkuffd_wp(newpmd); >>> @@ -2842,16 +2862,20 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >>> struct page *page; >>> pgtable_t pgtable; >>> pmd_t old_pmd, _pmd; >>> - bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; >>> - bool anon_exclusive = false, dirty = false; >>> + bool young, write, soft_dirty, uffd_wp = false; >>> + bool anon_exclusive = false, dirty = false, present = false; >>> unsigned long addr; >>> pte_t *pte; >>> int i; >>> + swp_entry_t swp_entry; >>> >>> VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); >>> VM_BUG_ON_VMA(vma->vm_start > haddr, vma); >>> VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); >>> - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd)); >>> + >>> + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) >>> + && !(is_swap_pmd(*pmd) && >>> + is_device_private_entry(pmd_to_swp_entry(*pmd)))); >>> >>> count_vm_event(THP_SPLIT_PMD); >>> >>> @@ -2899,20 +2923,25 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >>> return __split_huge_zero_page_pmd(vma, haddr, pmd); >>> } >>> >>> - pmd_migration = is_pmd_migration_entry(*pmd); >>> - if (unlikely(pmd_migration)) { >>> - swp_entry_t entry; >>> >>> + present = pmd_present(*pmd); >>> + if (unlikely(!present)) { >>> + swp_entry = pmd_to_swp_entry(*pmd); >>> old_pmd = *pmd; >>> - entry = pmd_to_swp_entry(old_pmd); >>> - page = pfn_swap_entry_to_page(entry); >>> - write = is_writable_migration_entry(entry); >>> + >>> + folio = pfn_swap_entry_folio(swp_entry); >>> + VM_BUG_ON(!is_migration_entry(swp_entry) && >>> + !is_device_private_entry(swp_entry)); >>> + page = pfn_swap_entry_to_page(swp_entry); >>> + write = is_writable_migration_entry(swp_entry); >>> + >>> if (PageAnon(page)) >>> - anon_exclusive = is_readable_exclusive_migration_entry(entry); >>> - young = is_migration_entry_young(entry); >>> - dirty = is_migration_entry_dirty(entry); >>> + anon_exclusive = >>> + is_readable_exclusive_migration_entry(swp_entry); >>> soft_dirty = pmd_swp_soft_dirty(old_pmd); >>> uffd_wp = pmd_swp_uffd_wp(old_pmd); >>> + young = is_migration_entry_young(swp_entry); >>> + dirty = is_migration_entry_dirty(swp_entry); >>> } else { >> This is where folio_try_share_anon_rmap_pmd() is skipped for device private pages, to which I referred in >> https://lore.kernel.org/linux-mm/f1e26e18-83db-4c0e-b8d8-0af8ffa8a206@redhat.com/ >> > Does it matter for device private pages/folios? It does not affect the freeze value. I think ClearPageAnonExclusive is needed. > > Balbir Singh > --Mika