From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93E77C7EE3A for ; Thu, 29 Aug 2024 08:10:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FDCD6B008C; Thu, 29 Aug 2024 04:10:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1AE8E6B0095; Thu, 29 Aug 2024 04:10:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 025DC6B0099; Thu, 29 Aug 2024 04:10:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D1D9F6B008C for ; Thu, 29 Aug 2024 04:10:39 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 832211603E0 for ; Thu, 29 Aug 2024 08:10:39 +0000 (UTC) X-FDA: 82504561398.11.E98FDD1 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) by imf23.hostedemail.com (Postfix) with ESMTP id 99E1A14000D for ; Thu, 29 Aug 2024 08:10:36 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JkCWo6T4; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf23.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.178 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724918938; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YVz/wFUOYpNbyYNUc2eYBCHKhOCPnySMmezJns0Oe5E=; b=NF45FExAXKdfmIqP8w8keNhJpLefQ0jHyzkmEHX7+UL+k/SGT/6L+W1FVi9x12bZnYo/Xq WN1d9aUu015Dsi+TRfJgQFkP4asLkDEVxnxRmlnZAzx1XTOu8ft3eebSnIHcW0lvwhnQrg kCtoeEAMGYx2Eollw2Iu2+vUy94sghE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724918938; a=rsa-sha256; cv=none; b=t8Jen+9Cuw/sZ1OE75RpEoMBaEKrP8RnrppRKA+2Gfnz5WFQ/a5uVoLedQOeaB9n/iIuJO umS1QkgvVKwgsuZ7xro+YKS+Al3rtmEjwO1hj9Ymbtc1P2H6ZKcWJel4L4ELdWxeCSYtcz szGKKp1nzTSY3v+FHyXMuSGliau976M= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JkCWo6T4; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf23.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.178 as permitted sender) smtp.mailfrom=muchun.song@linux.dev Message-ID: <24be821f-a95f-47f1-879a-c392a79072cc@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1724919034; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YVz/wFUOYpNbyYNUc2eYBCHKhOCPnySMmezJns0Oe5E=; b=JkCWo6T4DGdgbCAXphgbiqn76m/Hac2NMrto05Yqg228sV/hYwhh5QOjkkT+QTr9iK+es3 yl+P+qiEt9/MYTwMaSb44oBxw8VTe8/XNgUfCPzt0xGIsf1QGsxN2NaKK+INRn1DpE166Q U+w6BKUd0BaNIni7PN6uCd8i6zXapyE= Date: Thu, 29 Aug 2024 16:10:22 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v2 07/14] mm: khugepaged: collapse_pte_mapped_thp() use pte_offset_map_rw_nolock() To: Qi Zheng , david@redhat.com, hughd@google.com, willy@infradead.org, vbabka@kernel.org, akpm@linux-foundation.org, rppt@kernel.org, vishal.moola@gmail.com, peterx@redhat.com, ryan.roberts@arm.com, christophe.leroy2@cs-soprasteria.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org References: X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 99E1A14000D X-Stat-Signature: cgf6jcejdby4x91irzk6etusafwdgn1c X-Rspam-User: X-HE-Tag: 1724919036-969386 X-HE-Meta: U2FsdGVkX1+DINYwmiGoX3C4jaEr1kgB5IRhoEQTcxBzZRG711iwlhcysOH2K/UW608QlwIk17JCzYzAHtH8AVEMpuOjId0Gq58w8XCjvKgycnBFEXhOvHLs2I1hFzVFzS+wWy23m97Rm4C7cbw3VflpMUPUf/F9fYlYozRyLMJaWycoeNc439ZX+I7Zy6Zl1mKY5kHrzFV6KmLoLWDSgY6AQib92l2KU6FM2l99FN2098uBhsNnqEZSaX2zCLd4mrsdbHD/+BEmhQGjWgd+YAkmyFs2cjgw0R+A4mkvWqzVJdbpkOldmot/4ttCfr4YrhlSVO8i/ABXW/ihY3PrXilmI8ynVb5IcCl++7skEMIbDihyF2D9uSl2DvmCiosnfxxmoPFDWjApHswmzdsLUunJVPIeLzY7bGvxa5UX5uftMkbReKdjgiEt36/5qMUk4RgqlXXT6bYxlUu2NzIrXrN1CdCgAYqE+oCy/YqLPe4vt1RXOm9g0JmCcRNZMn/x8BXrHHX/NQt4s/3RjR7CcYe5Y81UaJUxAu1OTBYVUvLeo2WJuoQYfi0V054tLzcQDBb6jL8lXKW3d4FLp3abvcBN287HiheEr84CGJ9gBF1WhD688e8XCAUpABGsV93wFKULom7iKIcOvBJan3Blj4ckEC+erajofbQmfaI+73POUGYLHskMBbrM0nFQfX9iieaOWZpWbXlanyRzmrGBCt2V/lmx92/5AaR/JK8LXo4OrzQ4Nj7riYkRQkUL21HePnh1jmwsax/JHrM7/4pRGCoXMqMRB+piJ3M/miQeM/Qha0Rh7BggjEC4aIfFrgdTZlqoXgSOsOJo9MwDPyCZLpFuYBgKIqrSMJQLunEDoleNawWnJVcn3V6zK00t/JchR3eRYRZMPOxh4fgt66j3svSfuU1yx6dncc3xXuCUkJsPVkEC1C/Q/YVzAwJbBrscisQSu+b55IqA2KfLhas 9J/7mvyb igw/BlQWGadUG74Y4Ki+E53ygM2OikdEUlu+W/PvbD7F17/8Ww7Dl+przIy65B11BwycB7J9QViCaOVrO/gRnBkBhI8Ied9uirMDLCoY7jGr0OUQyB4GUdidZ59i1OfMqQX5B/s2Rf31L9Sd24gG5FsMuoxJP5NjufWkR8GbfWx3hXOJ3U8Binc9SKjg+HGHyjqD9dA6EKaXkNcJbYjt5YBPpqLnESnrOcyJ8URKDJ9ccyOcjP8rYpQF6gLyegSRg0S3fX+n9CgwMT4no7ebdn5vMsRz358CrbaOCJWk0XZg1NAX81BcMCLJeRwwrrLhZyR4athRGbSegelE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/22 15:13, Qi Zheng wrote: > In collapse_pte_mapped_thp(), we may modify the pte and pmd entry after > acquring the ptl, so convert it to using pte_offset_map_rw_nolock(). At > this time, the write lock of mmap_lock is not held, and the pte_same() > check is not performed after the PTL held. So we should get pgt_pmd and do > pmd_same() check after the ptl held. > > For the case where the ptl is released first and then the pml is acquired, > the PTE page may have been freed, so we must do pmd_same() check before > reacquiring the ptl. > > Signed-off-by: Qi Zheng > --- > mm/khugepaged.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 53bfa7f4b7f82..15d3f7f3c65f2 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1604,7 +1604,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, > if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) > pml = pmd_lock(mm, pmd); > > - start_pte = pte_offset_map_nolock(mm, pmd, haddr, &ptl); > + start_pte = pte_offset_map_rw_nolock(mm, pmd, haddr, &pgt_pmd, &ptl); > if (!start_pte) /* mmap_lock + page lock should prevent this */ > goto abort; > if (!pml) > @@ -1612,6 +1612,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, > else if (ptl != pml) > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > > + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) > + goto abort; > + > /* step 2: clear page table and adjust rmap */ > for (i = 0, addr = haddr, pte = start_pte; > i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { > @@ -1657,6 +1660,16 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, > /* step 4: remove empty page table */ > if (!pml) { > pml = pmd_lock(mm, pmd); > + /* > + * We called pte_unmap() and release the ptl before acquiring > + * the pml, which means we left the RCU critical section, so the > + * PTE page may have been freed, so we must do pmd_same() check > + * before reacquiring the ptl. > + */ > + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) { > + spin_unlock(pml); > + goto pmd_change; Seems we forget to flush TLB since we've cleared some pte entry? > + } > if (ptl != pml) > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > } > @@ -1688,6 +1701,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, > pte_unmap_unlock(start_pte, ptl); > if (pml && pml != ptl) > spin_unlock(pml); > +pmd_change: > if (notified) > mmu_notifier_invalidate_range_end(&range); > drop_folio: