From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 88E30CD13DA for ; Sun, 3 May 2026 00:38:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 565316B0005; Sat, 2 May 2026 20:38:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EF1A6B008A; Sat, 2 May 2026 20:38:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3906C6B008C; Sat, 2 May 2026 20:38:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 245F36B0005 for ; Sat, 2 May 2026 20:38:26 -0400 (EDT) Received: from smtpin08.hostedemail.com (lb01a-stub [10.200.18.249]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B06A61C0703 for ; Sun, 3 May 2026 00:38:25 +0000 (UTC) X-FDA: 84724247370.08.398BAD7 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf29.hostedemail.com (Postfix) with ESMTP id AE49B120005 for ; Sun, 3 May 2026 00:38:23 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20251104 header.b=clK+Hf1D; spf=pass (imf29.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1777768703; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hokuC80BP3p6M+eX+L7Nu5PX/O2OgxF0RQLXLG13BO0=; b=hn34ylbFvM6pbYSWhEpz0uA1VWUKIKa1sN9RNxgINMj9gEgPxizgQgilTvL6/Z4gtimpXM cZHpRo1WSLgzMWz21+4AyyO12UeV59mtRGpN1iuBRBGIu+AB+4hm8tIC1C9mVQ2ibfoZR7 LcjFM8FZV9u9MtNNJEeSmswTxuCXmVc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20251104 header.b=clK+Hf1D; spf=pass (imf29.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1777768703; a=rsa-sha256; cv=none; b=Ix/EarFe7PNoME4fawRIY/3hF1V1t9qDWIh4fhm7hBd6Pabsp2Hd+PsPYmE/sIpIAVMEfI lGeJtszbUq+JHnESiwdwSN5uS1+eBEkjwhqxO3GGhN4DcW2MmFq6D7md9b18R5RCp6FS6J yWyXlsFU7WHdpVA2lRZST67GKhkIikM= Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-678a16429c6so3323726a12.1 for ; Sat, 02 May 2026 17:38:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777768701; x=1778373501; darn=kvack.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=hokuC80BP3p6M+eX+L7Nu5PX/O2OgxF0RQLXLG13BO0=; b=clK+Hf1DvDrYxCOkEm/y/G5+Ok8HoUGZ9NDQUdxeW3UJhS5p5waAWPgmP9JfWhghkY S07JB9VDdyUOq/mT2YpMgjEtGWrv6M1OkgjzU/W/KIXourrrRaDQ0/ZK21TyYab2i2w2 FklE6UjgI1WWYmqo8L2I+aux7+2jfgepKmY/+pmqJEBgEe7cY7kUxrKkfyFrIRUtRkQy dCUDrfn91I69/DSydiy2A5Wk7s1d43Kf8cDzeT6YzfnpNIXrB8G8ensrfXGJeV8lwN+4 rBfiiftXvoZ6Z7e3CN6lmt7CDuUsG1RQteXQhm9dRFpcyMp+HRGDH46gqB7gz3mqiiR2 BCyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777768701; x=1778373501; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hokuC80BP3p6M+eX+L7Nu5PX/O2OgxF0RQLXLG13BO0=; b=oVj3xNABNZ4ksWBCQxJhGFpU/sYGa8cVAwsgyq2JBqPb6BKvwtJlL53Vqb8Rjp7Fey lJ/WjUmwMjdqr2IojqG07HQSsLf1GVHPo52WsUiJ9c0lHNFkEqOWtP0Do8vbDu0eLXaJ CT+9r4Q6lyXOt7zb5suqgS1UmK+zaXBHYTStCS0+j2gDt3y7D9sCjWPLIcjTgfU+IEYj CMj2MpexU3a05rYT5bCuHeXQIoq0ddEAoSR/y4n9q+ZNgxrvUTPUqTc7BfqUSUcRP13B XnqbvF4F1/hq+2XdM++9MGWau2maUT6IeF7tQK0Sdvjb/WNvaZg+Ea4oI1q3hCpuFEfH gq5Q== X-Forwarded-Encrypted: i=1; AFNElJ+HVvvE54Zx2z0T2uAQosTo0PUuAwLv4ii8h8SU+zNc255kkJbg2Wsjl+K01ahzLeOj7gU5B94cCQ==@kvack.org X-Gm-Message-State: AOJu0YxvV22uwIS1bA/uXpht0b3YEsarrEjNJVssrdIiHETrFjZL2qpx AITy07579m1cbO2lcxqfWY20NOm/HoX986TuCBPIpAyS7gnZdX9OIHC3 X-Gm-Gg: AeBDietaxP+4a4edPnzHPSdmTZrt3Q9wXSNPNkWPM9kXSCDilCew1u97M/wnv2iHwMp +m/cw+pxO4PqPA8YYC/IZz7IEgG0n5JebfkCnd8hn66nmHkAcI7Z5qUfq9YVTUendmC+XfxJngC 4X5FbqhD+HiKBAq+Swy5zoGWOShulrAlgjjZLV0nLAWgyJPCGFXHfBM+F9coWAL7vHsG4jOdMdh z7jjPAdsKkAtBG5frZ0q9vLx5sK6dI4jA/BFzRIjR4IDRLb77lHZ/eHSaOdD0P5AO42gFt5Hqyh l1oleQ0vVJDSpmcrHGBQPq5od68EOeOO3v7Ncnw1tumh7+q1jm2tkn1u8331r34GES+a4yZizsR xnXwbZrYr+gGRxFP/ElE8f0gbGprGg0ZeaLRIbLR/Si9X0eCRAYQF2VLZI8MN/IxO0qVMN6DBo5 02x8nDcgEWaSL8A/gQEkGF/29QtAerTCgscKlUwZVOHbM= X-Received: by 2002:a17:907:3c82:b0:ba2:7836:788c with SMTP id a640c23a62f3a-bbfe08827f9mr187764066b.12.1777768700538; Sat, 02 May 2026 17:38:20 -0700 (PDT) Received: from localhost ([185.92.221.13]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-bbe6d66c90dsm250014266b.43.2026.05.02.17.38.18 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Sat, 02 May 2026 17:38:19 -0700 (PDT) Date: Sun, 3 May 2026 00:38:18 +0000 From: Wei Yang To: "David Hildenbrand (Arm)" Cc: Wei Yang , akpm@linux-foundation.org, ljs@kernel.org, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, riel@surriel.com, vbabka@kernel.org, harry@kernel.org, jannh@google.com, rppt@kernel.org, surenb@google.com, mhocko@suse.com, shuah@kernel.org, linux-mm@kvack.org, Gavin Guo Subject: Re: [PATCH 1/2] mm/huge_memory: return true if split_huge_pmd_locked() split PMD to migration entry Message-ID: <20260503003818.t35q5roc7osx6se2@master> Reply-To: Wei Yang References: <20260415010839.20124-1-richard.weiyang@gmail.com> <20260415010839.20124-2-richard.weiyang@gmail.com> <79e164a2-47ce-4a02-82f5-164515760b6d@kernel.org> <20260426091957.a227zxgkqapibtud@master> <20260429024913.iepoi7cit3xnwca2@master> <413feed4-6aab-43d9-b7e5-a9386fa79f4b@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <413feed4-6aab-43d9-b7e5-a9386fa79f4b@kernel.org> User-Agent: NeoMutt/20170113 (1.7.2) X-Stat-Signature: k7ef4o63bdn9fhsse7hni9pgmkcdyiyd X-Rspamd-Queue-Id: AE49B120005 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1777768703-999017 X-HE-Meta: U2FsdGVkX18UT3vfkMPGEHe+ooELcZCdyLZuO4s8M8UOvmG2RhfVpCqWkq6I2Q/z6VWXXePLFAlPreESFy6pcNNsiEs4xyswlKUIV7jmWs+GWvfLd5Q6j5/aCGjR7BZFvXuInmRsWAihfxkw81kHWzXCmmobijlavA0kMF2kKQVLxKcsQit9mjmCV92REkI1qCONGvXdElyjo0aOddzFoUcZzovP+K8CrfgXvNI5ThYBmWm0Jgwkxqo4euO7sHvUY5BRKdhZD97IX1eFlBBCeSso5pCnhr/zZQDDLR0Tn1jCVNkIzNlEmidx8xwn/B4kZdHYMfC2L+93k/VkHoSk+iW/h6WJES9V7zZ1mo4QUqwGWkNyD75XI30syvBmvByKpHkFE2l78k98vIYWIdqH0dlMvZXJHD8oNoIJWynmMDyFpD3KH0tNpxdmgbzdIypiLBFoqjuNh0SDd9zsCTKAKltuhmlV/VO6CaY1o6vysEHInKiKBjeAQublXLrdhwz7K/OI/hKmtaOGxd8mzMRAmHtZG5bBnXoZ8vfx6Ui1dV2vVTUm4AL4pmlvnotCQb8MXB92BajPAljiWke28iNAWP0rn1mfEKFXujjJNLELzpvGKZkk4s/t6DykyqSEmlJWmKGwEdZInLtQAAzkN9L751EEwC1rFV28Cw0gR9pQP0/Dp2l9D8VX1f3iwEDJtpVSnoDiCg0FUV0zPYJwmedB2QnxAhDn9jTbtMFY8HfJ7cg+PJKEuhXnT4r7gp7jJDU+9eELz8xFD9zr+6SsIcUUWxjX4aHajLgh5DN+l4+sHA4oa7SeYTZ3uCY5TA0i+wj0m42n0z8VouonCYv/b03zcxuQ8TgCNj/Ojm+2UNtiiJNVhDDalLL9H6u9UReuGGPImYQQnokrTmBoF5CaLCi5bcdDmPN6Ai0qk+cAXK9+HHFZwCgf+AhW37x5DKhdT5wTeUls1DKP9/OWMRNKOyl W+kCtuvK CvPl1Fb6ezxB6A+ZMRNTXbcrJomSADYQOKxOq0HNwVrareEc5Fsu4PJc/uJO6kx4xgql2EWQs3wLMb+wWrMNcBdMXNE3o+0cZEAKriPOvHfp+ZMdzTshqFAJPxIIFitxflm6sbOOre1L2uRZ4Qnr4R7dJD3z3nyJ7O9KqchS/L+ahg4BLBcVvmtflsNSjz5uEGMIjrAtDif4ix14ZgWNTGLIKgO0UR801eOkSZK2CL5khrO+3kP/IqoPBHVS8IrZ9IQiS/X2IffVGJIQrVk2SM+wYSy036zvbA1lbbzgDj4laLhpBxsJ3jQMVPHHWcXw1MkdC+VMBj65D+K9LYonVs7NXlsdWuVpNVXFldDfZaoA4ok0xBL9+0e5OXILnyNmrTy86DBOFnJ2Ql7bUvOWW/4xRp6Zv52LexzuS0AzQSwKyOK9P67G6ywAQQbIDL896yLfJI3ImgmKJsrbkQktYwaciKkXRRD2+xJAJlCjJcuTguUAV2NfFS0e1iLQOglNngCP9i5XxZdowA19+ZsupN9/zY+cktpQI8epNEFvGilUVIUGmJRtbJzm4Ea0ZhEvHaux9+swY83SlDjv5fsylDO15dvDnPJ9ab7DKhvaFTmsoANqzsu3aKHWLNACxk0XoyELY12jxD2TIf/6/q3r6x1mOHUU/0xjklz/bTukEZw5DSsMaQTQf0sXv6kAlRKSo/nusicFq1N3ufq3YqS7W/BgzfhMcfQbDWEzsSvUN1Csn06Ds+C8F6IfAehw/QzV2Vuwm Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 29, 2026 at 08:55:07AM +0200, David Hildenbrand (Arm) wrote: >On 4/29/26 04:49, Wei Yang wrote: >> On Tue, Apr 28, 2026 at 10:24:42AM +0200, David Hildenbrand (Arm) wrote: >>> On 4/26/26 11:19, Wei Yang wrote: >>>> >>>> Will adjust related places. >>>> >>>> >>>> Got it. >>>> >>>> >>>> Here is my understanding. >>>> >>>> We get here when page_vma_mapped_walk() touch a pmd entry, with three cases: >>>> >>>> * pmd_trans_huge() >>>> * pmd_is_migration_entry() >>>> * pmd_is_device_private_entry() >>>> >>>> For the first two cases, we grab pmd_lock() and then check the condition is >>>> still valid before return. But for case 3, after grab pmd_lock(), it return >>>> directly. >>>> >>>> This may give chance for another thread to split pmd_is_device_private_entry() >>>> to pte mapped, IIUC. For this case, we should restart the walk here. >>> >>> >>> So what you are saying is that we should re-validate in page_vma_mapped_walk() >>> that we indeed still have a device-private entry after grabbing the lock? >>> >>> That's what we do in map_pte() through pmd_same() check. >>> >>> Likely we should apply the same model here! >>> >> >> Below is my proposed change: >> >> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c >> index a4d52fdb3056..6e915d35ae54 100644 >> --- a/mm/page_vma_mapped.c >> +++ b/mm/page_vma_mapped.c >> @@ -273,17 +273,21 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) >> >> if (softleaf_is_device_private(entry)) { >> pvmw->ptl = pmd_lock(mm, pvmw->pmd); >> - return true; >> + if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd))) >> + return true; > >As we have a softleaf entry, I assume we wouldn't expect to get any other bits >(access/dirty) set until we grab the lock. Verifying >softleaf_is_device_private() again would be better cleaner, though. > Got it. >But really, I do wonder if we should just have a "goto retry" back to the "pmde >= pmdp_get_lockless(pvmw->pmd);" instead? > Sounds reasonable. See below. > >And now I wonder why we don't have a check_pmd() handling in there? :/ > >Should we check for the pfn here? Thanks for pointing out. I think you are right. After re-read the code, more questions come up my mind. I am afraid we need more cleanup for page_vma_mapped_walk(). Below is my finding based on current understanding: 1. thp_migration_supported() seems not necessary The code reaches here means pmd_is_migration_entry() return true, which means CONFIG_ARCH_ENABLE_THP_MIGRATION is set, otherwise softleaf_from_pmd() should return softleaf_mk_none() which is not a migration softleaf. CONFIG_ARCH_ENABLE_THP_MIGRATION is set in turn means CONFIG_TRANSPARENT_HUGEPAGE is set, so thp_migration_supported() must returns true. 2. if migration entry change under us, we may need to handle on pte level In pmd_is_migration_entry() -> !pmd_present() branch, we have: if (!softleaf_is_migration(entry) || !check_pmd(softleaf_to_pfn(entry), pvmw)) return not_found(pvmw); return true; But I think we need do this: if (softleaf_is_migration(entry)) { if (check_pmd(softleaf_to_pfn(entry), pvmw)) return not_found(pvmw); return true; } Per my understanding, if the pmd_is_migration_entry() change under us, we need to handle on pte level. Just like pmd_trans_huge() case. Break the loop and return false seems not consistent. 3. add proper check for device private entry For device private entry, currently we just grab lock and return. While according to the handling to pmd_trans_huge() and pmd_is_migration_entry(), we should: * re-validate it is still device private entry after pmd_lock() * check PVMW_MIGRATION * check_pmd() 4. consolidate pmd entry handling Per my understanding, there are 4 cases for pmd entry handling: * pmd_trans_huge() * pmd_is_migration_entry() * pmd_is_device_private_entry() * !pmd_present() Now we handle them in a mixed state check, which complicates the logic. And the first three share similar logic. (If my above analysis is correct.) * grab pmd_lock() * re-validate pmde * check PVMW_MIGRATION * check_pmd Here I would like to take a more bold step: consolidate handling for these three cases. Below is what it would look like. pmde = pmdp_get_lockless(pvmw->pmd); if (pmd_trans_huge(pmde) || pmd_is_valid_softleaf(pmde)) { unsigned long pfn; bool is_migration; bool for_migration; pvmw->ptl = pmd_lock(mm, pvmw->pmd); if (pmd_same(pmde, pmdp_get_lockless(pvmw->pmd))) { is_migration = pmd_is_migration_entry(pmde); for_migration = !!(pvmw->flags & PVMW_MIGRATION); if (is_migration != for_migration) return not_found(pvmw); if (pmd_trans_huge(pmde)) pfn = pmd_pfn(pmde); else pfn = softleaf_to_pfn(softleaf_from_pmd(pmde)); if (!check_pmd(pfn, pvmw)) return not_found(pvmw); return true; } /* THP pmd was split under us: handle on pte level */ spin_unlock(pvmw->ptl); pvmw->ptl = NULL; } else if (!pmd_present(pmde)) { if ((pvmw->flags & PVMW_SYNC) && thp_vma_suitable_order(vma, pvmw->address, PMD_ORDER) && (pvmw->nr_pages >= HPAGE_PMD_NR)) sync_with_folio_pmd_zap(mm, pvmw->pmd); step_forward(pvmw, PMD_SIZE); continue; } 5. use "goto retry" As you mentioned above. Instead of "handle on pte level", go to pmdp_get_lockless() for retry. This looks more reasonable to me. > >> + /* THP pmd was split under us: handle on pte level */ >> + spin_unlock(pvmw->ptl); >> + pvmw->ptl = NULL; > > > >> + } else { >> + if ((pvmw->flags & PVMW_SYNC) && >> + thp_vma_suitable_order(vma, pvmw->address, >> + PMD_ORDER) && >> + (pvmw->nr_pages >= HPAGE_PMD_NR)) >> + sync_with_folio_pmd_zap(mm, pvmw->pmd); >> + >> + step_forward(pvmw, PMD_SIZE); >> + continue; >> } >> - >> - if ((pvmw->flags & PVMW_SYNC) && >> - thp_vma_suitable_order(vma, pvmw->address, >> - PMD_ORDER) && >> - (pvmw->nr_pages >= HPAGE_PMD_NR)) >> - sync_with_folio_pmd_zap(mm, pvmw->pmd); >> - >> - step_forward(pvmw, PMD_SIZE); >> - continue; >> } >> if (!map_pte(pvmw, &pmde, &ptl)) { >> if (!pvmw->pte) >> >> After this, we could simplify the logic in try_to_migrate_one() as: >> >> @@ -2471,14 +2471,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, >> * so we can detect this scenario and properly >> * abort the walk. >> */ >> - if (split_huge_pmd_locked(vma, pvmw.address, >> - pvmw.pmd, true)) { >> - page_vma_mapped_walk_done(&pvmw); >> - break; >> - } >> - flags &= ~TTU_SPLIT_HUGE_PMD; >> - page_vma_mapped_walk_restart(&pvmw); >> - continue; >> + ret = split_huge_pmd_locked(vma, pvmw.address, >> + pvmw.pmd, true); >> + page_vma_mapped_walk_done(&pvmw); >> + break; >> } > >Right. But just to be clear: Let's split the page_vma_mapped_walk() validation >(which looks like a bugfix to me) from the other optimization. > Sure, maybe we can split the page_vma_mapped_walk() cleanup out to another patch set for better reviewing? >-- >Cheers, > >David -- Wei Yang Help you, Help me