From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5EDE7CF9C71 for ; Tue, 24 Sep 2024 09:05:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:To:References:Message-Id: Content-Transfer-Encoding:Cc:Date:In-Reply-To:From:Subject:Mime-Version: Content-Type:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=abs3n/vpyfYl04oV7NjYPSTvEDkl0o9BNDpstpMewHs=; b=1150eJHoOED5ujR30E/1k2Mgz5 59eqdUUytr+xMsJmnGk8xy2deUN4roZUXm6+DmkUfKK+L78u8ZNQU772AZ1azAg2tZ3fwjKTfQCjB EeN9b5P3uoELsAHLQq9qF31SeLRuPjGDEjsmcYB3aYkJdCR2SEeA8BizBP+oq+x1XK/WIReH37D9j cuDmVt6X1yJosrsLcSNZimEmkTeXHB9P3zdYCVFVEbD4NDDxGdRPn3auZEnJs3a9xkPfP/fEGRpWv cyPOL//tBwvWACgKbGlNvM3v+HrgTh0Z5TiQQfa56ttZOIu9yGrhKVW6GntqsHR+UBqlK+wrruMy/ XYl0QX7A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1st1U8-00000001j8P-0YQ0; Tue, 24 Sep 2024 09:05:28 +0000 Received: from out-182.mta0.migadu.com ([91.218.175.182]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1st1Ss-00000001igG-1DnJ for linux-arm-kernel@lists.infradead.org; Tue, 24 Sep 2024 09:04:11 +0000 Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1727168648; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=abs3n/vpyfYl04oV7NjYPSTvEDkl0o9BNDpstpMewHs=; b=Z0vloto/ZEnjeTfwU4wrjMDWCaS6OARpqwqeRDxUEenzjgdT65bmqjPHwc5jL058fGMvyK chtcsw3tPiZpoq0gwt8s4kHRiTzwlUAMLlJeT4RIymaVhIwbxxRTElrNhAhzUwKRBS9S4A NMmxn0ToZD0BtEm3Oak81Xuzkbyeyo8= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3818.100.11.1.3\)) Subject: Re: [PATCH v4 07/13] mm: khugepaged: collapse_pte_mapped_thp() use pte_offset_map_rw_nolock() X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Tue, 24 Sep 2024 17:03:16 +0800 Cc: david@redhat.com, hughd@google.com, willy@infradead.org, vbabka@kernel.org, akpm@linux-foundation.org, rppt@kernel.org, vishal.moola@gmail.com, peterx@redhat.com, ryan.roberts@arm.com, christophe.leroy2@cs-soprasteria.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <07d975c50fe09c246e087303b39998430b1a66bd.1727148662.git.zhengqi.arch@bytedance.com> <79699B24-0D99-4051-91F3-5695D32D62AC@linux.dev> <2343da2e-f91f-4861-bb22-28f77db98c52@bytedance.com> <1D1872F1-7280-4F43-8213-A720C56B0646@linux.dev> To: Qi Zheng X-Migadu-Flow: FLOW_OUT X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240924_020410_653965_0A425637 X-CRM114-Status: GOOD ( 14.01 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org > On Sep 24, 2024, at 16:57, Qi Zheng = wrote: >=20 >=20 >=20 > On 2024/9/24 16:52, Muchun Song wrote: >>> On Sep 24, 2024, at 15:29, Qi Zheng = wrote: >>>=20 >>>=20 >>>=20 >>> On 2024/9/24 15:14, Muchun Song wrote: >>>>> On Sep 24, 2024, at 14:11, Qi Zheng = wrote: >>>>> =EF=BB=BFIn collapse_pte_mapped_thp(), we may modify the pte and = pmd entry after >>>>> acquring the ptl, so convert it to using = pte_offset_map_rw_nolock(). At >>>>> this time, the pte_same() check is not performed after the PTL = held. So we >>>>> should get pgt_pmd and do pmd_same() check after the ptl held. >>>>>=20 >>>>> Signed-off-by: Qi Zheng >>>>> --- >>>>> mm/khugepaged.c | 14 +++++++++++--- >>>>> 1 file changed, 11 insertions(+), 3 deletions(-) >>>>>=20 >>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>>> index 6498721d4783a..8ab79c13d077f 100644 >>>>> --- a/mm/khugepaged.c >>>>> +++ b/mm/khugepaged.c >>>>> @@ -1605,7 +1605,7 @@ int collapse_pte_mapped_thp(struct mm_struct = *mm, unsigned long addr, >>>>> if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) >>>>> pml =3D pmd_lock(mm, pmd); >>>>>=20 >>>>> - start_pte =3D pte_offset_map_nolock(mm, pmd, haddr, &ptl); >>>>> + start_pte =3D pte_offset_map_rw_nolock(mm, pmd, haddr, = &pgt_pmd, &ptl); >>>>> if (!start_pte) /* mmap_lock + page lock should prevent = this */ >>>>> goto abort; >>>>> if (!pml) >>>>> @@ -1613,6 +1613,9 @@ int collapse_pte_mapped_thp(struct mm_struct = *mm, unsigned long addr, >>>>> else if (ptl !=3D pml) >>>>> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >>>>>=20 >>>>> + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) >>>>> + goto abort; >>>>> + >>>>> /* step 2: clear page table and adjust rmap */ >>>>> for (i =3D 0, addr =3D haddr, pte =3D start_pte; >>>>> i < HPAGE_PMD_NR; i++, addr +=3D PAGE_SIZE, pte++) { >>>>> @@ -1645,7 +1648,6 @@ int collapse_pte_mapped_thp(struct mm_struct = *mm, unsigned long addr, >>>>> nr_ptes++; >>>>> } >>>>>=20 >>>>> - pte_unmap(start_pte); >>>>> if (!pml) >>>>> spin_unlock(ptl); >>>>>=20 >>>>> @@ -1658,13 +1660,19 @@ int collapse_pte_mapped_thp(struct = mm_struct *mm, unsigned long addr, >>>>> /* step 4: remove empty page table */ >>>>> if (!pml) { >>>>> pml =3D pmd_lock(mm, pmd); >>>>> - if (ptl !=3D pml) >>>>> + if (ptl !=3D pml) { >>>>> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >>>>> + if (unlikely(!pmd_same(pgt_pmd, = pmdp_get_lockless(pmd)))) { >>>>> + spin_unlock(pml); >>>>> + goto abort; >>>> Drop the reference of folio and the mm counter twice at the label = of abort and the step 3. >>>=20 >>> My bad, should set nr_ptes to 0 and call flush_tlb_mm() here, right? >> Or add a new label "out" just below the "abort". Then go to out. >=20 > For this way, we also need to call flush_tlb_mm() first, like the > following: >=20 > if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) { > spin_unlock(pml); > flush_tlb_mm(mm); > goto out; > } Fine. >=20 >>>=20 >>>>> + } >>>>> + } >>>>> } >>>>> pgt_pmd =3D pmdp_collapse_flush(vma, haddr, pmd); >>>>> pmdp_get_lockless_sync(); >>>>> if (ptl !=3D pml) >>>>> spin_unlock(ptl); >>>>> + pte_unmap(start_pte); >>>>> spin_unlock(pml); >>>> Why not? >>>> pte_unmap_unlock(start_pte, ptl); >>>> if (pml !=3D ptl) >>>> spin_unlock(pml); >>>=20 >>> Both are fine, will do. >>>=20 >>> Thanks, >>> Qi >>>=20 >>>>>=20 >>>>> mmu_notifier_invalidate_range_end(&range); >>>>> -- >>>>> 2.20.1