From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7645C3C061C for ; Mon, 11 May 2026 08:53:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778489618; cv=none; b=dDwYljCl9HcPXiqLzN5KvVWkqEsCjytg5T+91QP5czbKHdLc1msbXWSXiTeZOVVFAMh3LeRWoNDfB0Zsaea065l0IcCP1yr5v11FCjaYL7kcQXRKDZ9bYiO97mlwxzENdQ0BUJQzmxCnWW6POGNwngAZSQGEKwYYza3ePcxqRbk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778489618; c=relaxed/simple; bh=qDtZ/JtcHu4vyZ/Et0OVI8IWYzPibJSvUK1IXLViM1I=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=XyNlMqSCjAQVSJlohxjKzTcZrpLCnni9/wo1u9xpJynYu+LLa5gZHdWXehjLVFqG3q1y9Iw+vwdDzzrG/r6WRimwd29FT5XyLP8KdL2e03jk3y9FBk9lHvbdH/76ATr9XxW91P3fU0BCALyRe7VsuBj99fjcXfB+Tj8avWL0jpc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=B3tHl1Ie; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="B3tHl1Ie" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7D5D516F8; Mon, 11 May 2026 01:53:30 -0700 (PDT) Received: from [10.164.148.37] (MacBook-Pro.blr.arm.com [10.164.148.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 819B53F7B4; Mon, 11 May 2026 01:53:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778489615; bh=qDtZ/JtcHu4vyZ/Et0OVI8IWYzPibJSvUK1IXLViM1I=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=B3tHl1IeCHcp4XzrkCXVx1rqL/gUgtDrfbpv6Uf+D1Ir+p251ed1lDpwNrsmsFUA5 7l0zlTOt5PMDy7DZIqcL1Vv/oeE/inaJsIXnYHZY0ldf/PMEy20058G2a7A1yQLmyP pl9mO8Od5//Oy1ppgdXsFHLMF+bv2lV4KqSIZNw0= Message-ID: Date: Mon, 11 May 2026 14:23:24 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing in try_to_unmap_one To: "David Hildenbrand (Arm)" , akpm@linux-foundation.org, ljs@kernel.org, hughd@google.com, chrisl@kernel.org, kasong@tencent.com Cc: riel@surriel.com, liam@infradead.org, vbabka@kernel.org, harry@kernel.org, jannh@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, qi.zheng@linux.dev, shakeel.butt@linux.dev, baohua@kernel.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, rppt@kernel.org, surenb@google.com, mhocko@suse.com, baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com, pfalcato@suse.de, ryan.roberts@arm.com, anshuman.khandual@arm.com References: <20260506094504.2588857-1-dev.jain@arm.com> <20260506094504.2588857-3-dev.jain@arm.com> <5a4c3c3d-66c8-4ef6-bb6a-2ec0e32694a1@kernel.org> Content-Language: en-US From: Dev Jain In-Reply-To: <5a4c3c3d-66c8-4ef6-bb6a-2ec0e32694a1@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 11/05/26 12:40 pm, David Hildenbrand (Arm) wrote: > On 5/6/26 11:44, Dev Jain wrote: >> Simplify the code by refactoring the folio_test_hugetlb() branch into >> a new function. >> >> While at it, convert BUG helpers to WARN helpers. >> >> Signed-off-by: Dev Jain >> --- >> mm/rmap.c | 117 ++++++++++++++++++++++++++++++++---------------------- >> 1 file changed, 69 insertions(+), 48 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index a5f067a09de0f..a98acdea0530a 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1978,6 +1978,68 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio, >> FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY); >> } >> >> +/* Returns false if unmap needs to be aborted */ >> +static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma, > > I'm wondering whether we should make it clearer that this belongs to the > try_to_unmap family by calling it > > ttu_hugetlb_folio Yes I had suggested a ttu_ prefix somewhere else in the first version, Lorenzo didn't like it (or probably he didn't like that specific use of ttu): https://lore.kernel.org/all/a8b06f36-98e1-435c-881f-67242bc4304a@lucifer.local/ Don't know about a better name other than "commit_ttu_lazyfree_folio" in that case, but for the hugetlb case, I like ttu_hugetlb_folio. > >> + struct folio *folio, struct page_vma_mapped_walk *pvmw, >> + struct page *page, enum ttu_flags flags, pte_t *pteval, >> + struct mmu_notifier_range *range, bool *exit_walk) >> +{ >> + /* >> + * The try_to_unmap() is only passed a hugetlb page >> + * in the case where the hugetlb page is poisoned. >> + */ >> + VM_WARN_ON_PAGE(!PageHWPoison(page), page); > > IIRC, we will never actually get a tail page here. > > Can we avoid passing a page by checking instead whether the hugetlb folios is > marked as having a poisoned page? > > See the folio_test_set_hwpoison() in hugetlb_update_hwpoison(). > > So you can simply use folio_test_hwpoison here instead. Okay I will confirm and do this. > > >> + /* >> + * huge_pmd_unshare may unmap an entire PMD page. >> + * There is no way of knowing exactly which PMDs may >> + * be cached for this mm, so we must flush them all. >> + * start/end were already adjusted above to cover this >> + * range. >> + */ >> + flush_cache_range(vma, range->start, range->end); >> + >> + /* >> + * To call huge_pmd_unshare, i_mmap_rwsem must be >> + * held in write mode. Caller needs to explicitly >> + * do this outside rmap routines. >> + * >> + * We also must hold hugetlb vma_lock in write mode. >> + * Lock order dictates acquiring vma_lock BEFORE >> + * i_mmap_rwsem. We can only try lock here and fail >> + * if unsuccessful. >> + */ >> + if (!folio_test_anon(folio)) { >> + struct mmu_gather tlb; >> + >> + VM_WARN_ON(!(flags & TTU_RMAP_LOCKED)); >> + if (!hugetlb_vma_trylock_write(vma)) { >> + *exit_walk = true; >> + return false; >> + } >> + >> + tlb_gather_mmu_vma(&tlb, vma); >> + if (huge_pmd_unshare(&tlb, vma, pvmw->address, pvmw->pte)) { >> + hugetlb_vma_unlock_write(vma); >> + huge_pmd_unshare_flush(&tlb, vma); >> + tlb_finish_mmu(&tlb); >> + /* >> + * The PMD table was unmapped, >> + * consequently unmapping the folio. >> + */ >> + *exit_walk = true; >> + return true; >> + } >> + hugetlb_vma_unlock_write(vma); >> + tlb_finish_mmu(&tlb); >> + } >> + *pteval = huge_ptep_clear_flush(vma, pvmw->address, pvmw->pte); >> + if (pte_dirty(*pteval)) >> + folio_mark_dirty(folio); >> + >> + *exit_walk = false; >> + return true; > > > Can we instead introduce some enum that tells the caller how to proceed? > > I assume we have > > (a) Abort walk (ret = false + page_vma_mapped_walk_done()) > > (b) Walk done (ret = true + page_vma_mapped_walk_done()) > > (c) Continue walk (call page_vma_mapped_walk()) > > enum ttu_walk_result { > TTU_WALK_CONTINUE, > TTU_WALK_ABORT, > TTU_WALK_DONE > } I had replied to such a suggestion here: https://lore.kernel.org/all/caa7c455-7472-48eb-a5dc-145e587d67ba@arm.com/ Probably we don't have any other solution : ) > >> +} >> + >> /* >> * @arg: enum ttu_flags will be passed to this argument >> */ >> @@ -2115,56 +2177,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >> PageAnonExclusive(subpage); >> >> if (folio_test_hugetlb(folio)) { >> - bool anon = folio_test_anon(folio); >> - >> - /* >> - * The try_to_unmap() is only passed a hugetlb page >> - * in the case where the hugetlb page is poisoned. >> - */ >> - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); >> - /* >> - * huge_pmd_unshare may unmap an entire PMD page. >> - * There is no way of knowing exactly which PMDs may >> - * be cached for this mm, so we must flush them all. >> - * start/end were already adjusted above to cover this >> - * range. >> - */ >> - flush_cache_range(vma, range.start, range.end); >> + bool exit_walk; >> >> - /* >> - * To call huge_pmd_unshare, i_mmap_rwsem must be >> - * held in write mode. Caller needs to explicitly >> - * do this outside rmap routines. >> - * >> - * We also must hold hugetlb vma_lock in write mode. >> - * Lock order dictates acquiring vma_lock BEFORE >> - * i_mmap_rwsem. We can only try lock here and fail >> - * if unsuccessful. >> - */ >> - if (!anon) { >> - struct mmu_gather tlb; >> - >> - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); >> - if (!hugetlb_vma_trylock_write(vma)) >> - goto walk_abort; >> - >> - tlb_gather_mmu_vma(&tlb, vma); >> - if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) { >> - hugetlb_vma_unlock_write(vma); >> - huge_pmd_unshare_flush(&tlb, vma); >> - tlb_finish_mmu(&tlb); >> - /* >> - * The PMD table was unmapped, >> - * consequently unmapping the folio. >> - */ >> - goto walk_done; >> - } >> - hugetlb_vma_unlock_write(vma); >> - tlb_finish_mmu(&tlb); >> + ret = unmap_hugetlb_folio(vma, folio, &pvmw, subpage, >> + flags, &pteval, &range, >> + &exit_walk); >> + if (exit_walk) { >> + page_vma_mapped_walk_done(&pvmw); >> + break; > > In the old walk_abort case you wouldn't set ret = false? ret will be set appropriately in unmap_hugetlb_folio. > > When returning the enum you could simply do something like > > switch (ret) { > case TTU_WALK_ABORT: > goto walk_abort; > case TTU_WALK_DONE: > goto walk_done; > default: > break; > } > > > While I like this patch, can we please just move all the hugetlb shite into this > helper function? > > Essentially, get rid of hugetlb special casing in the remainder of the function. > > That also makes the function name clearer (right now it's only doing a part of > hugetlb folio unmapping). Okay I can try that. That would mean splitting the pvmw walk for hugetlb and non-hugetlb, but I suspect it would be very less code duplication. >