From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 215EDC3DA4A for ; Thu, 1 Aug 2024 13:52:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A75046B00C1; Thu, 1 Aug 2024 09:52:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FD686B00C3; Thu, 1 Aug 2024 09:52:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84F766B00C5; Thu, 1 Aug 2024 09:52:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5C4816B00C1 for ; Thu, 1 Aug 2024 09:52:38 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 131BCA75B4 for ; Thu, 1 Aug 2024 13:52:38 +0000 (UTC) X-FDA: 82403816796.06.C9BE2C7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf26.hostedemail.com (Postfix) with ESMTP id F13AE14000D for ; Thu, 1 Aug 2024 13:52:35 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="fM/DYDY/"; spf=pass (imf26.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722520299; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k6+M5lxqWFd3gMZ3w6UL7CO9WtUAKE4v1JrMS05eUKA=; b=O3R8e7Ekb9Y2Mnn5FCYErf1CYqTZe3mrJAnMGnC8k6nXYOSrV+5ojxNwvd7YQXIfal/bUA X4iqyBLJLd9VU2Q7qwvpEVVfooqAekuJVeaoz2/2OHqvHrcHQg8yoXyz8N/9dY6PpZmVBC mFpeqapmEBS+m3ZVSi3yPqrb2kVp2lI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722520299; a=rsa-sha256; cv=none; b=VyTqiYesDwqc27DCJxQB8dk6dpL2yAijUjarP/hIkLE/FxQ8TCnTkGpNOt/A7CkFgzmvXH bNXN+KQgCH0GsiiGjF+V6/h6SLZ53O+Fsa8I8tQpjyky9BRjJZgxYA0fJ6bOY3wfJwV8nx MgKqGufQ31oLBhO9YPNTPpGDoETf6uI= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="fM/DYDY/"; spf=pass (imf26.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722520355; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k6+M5lxqWFd3gMZ3w6UL7CO9WtUAKE4v1JrMS05eUKA=; b=fM/DYDY/iRtyCsNPgmoy7tGGDLOSwCQnF0fBzR9ZeJnQ3d2BvgImKNmNxidYQDF8at2Ft/ T8gyJGKQjgtwtrq6QgEsOHYEiKq0MXRsqpnERE2mMr9j3IGfAzajGLh+NUSNPQEWk9qEgE JDQM4o5VY9w2Xva6YkBvzZhtFJGEYIo= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-628-EGJjGlq9MgGjTOZZI4Xdrg-1; Thu, 01 Aug 2024 09:52:34 -0400 X-MC-Unique: EGJjGlq9MgGjTOZZI4Xdrg-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6b7678caf7dso18399626d6.1 for ; Thu, 01 Aug 2024 06:52:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722520353; x=1723125153; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=k6+M5lxqWFd3gMZ3w6UL7CO9WtUAKE4v1JrMS05eUKA=; b=bxBd4U7XZ0nknDFMfSEY/30qjzXeFHfwLBOCX21KVuRzR+RwVh3Hq3HfOGT1FD+bvZ zCizFICiFGcPTolVUlQ/sw6ptIvQjfL0v1TFlKdy7jTBSVcjLwsj7H7Q0grL23Dda3Uv y+5nMKYTwMZEi83pSJRsbESFBqY7Kn9oR6Ne4sa78rSF0QSjHo3KTuUwx42DiMaQlRxa BQldOBvX7HqCh+Vb0HtowFnJOdd6g8wM9z964Y7UQXtpZy3Q7HKZ3OnZRZtF9L+sNubS Olx+9cfiQA/CYloZCfp0+LmH2LT9zcHp31AxsV6sFi+enxvV6FXe5WsosRUkIpU2QE7R F6QQ== X-Forwarded-Encrypted: i=1; AJvYcCXYz8zgjVsUllSyUUcwucOJlhoITDccdJh/wa2XX1+fL+3/TDJaBvCtXQ/zDHzkNIQuRMq4F/FlBX8FpGLKGPtFe4M= X-Gm-Message-State: AOJu0Yzef7FNOfHrpNg3zwOh/6LIFhhfXbPlGHPbCwdJcvY+PCfpiVzH EQ+abN5zxK9Ynwegny1qva1VX2yTWgBW6Gc1BBH2cffmOdwSzIq4p97jJqgdJdMAvqD/clzHkfW /reIaAlpxDF1+kZErKciaCVIFWIfHqw+CjMcA5wqBEsz7yYly X-Received: by 2002:a05:620a:1aa7:b0:79f:af4:66e8 with SMTP id af79cd13be357-7a34ee96b5emr2432285a.2.1722520353522; Thu, 01 Aug 2024 06:52:33 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH6sKK7x0m2Ws2ix0NAwD4CGdu/V5JvqWQLttEjLrMQ0dCRu8cILfqpCGlY3ZlJ/ZSmBWXrpw== X-Received: by 2002:a05:620a:1aa7:b0:79f:af4:66e8 with SMTP id af79cd13be357-7a34ee96b5emr2431385a.2.1722520353070; Thu, 01 Aug 2024 06:52:33 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a1fafec133sm353312585a.94.2024.08.01.06.52.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Aug 2024 06:52:32 -0700 (PDT) Date: Thu, 1 Aug 2024 09:52:30 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, James Houghton , stable@vger.kernel.org, Oscar Salvador , Muchun Song , Baolin Wang Subject: Re: [PATCH v3] mm/hugetlb: fix hugetlb vs. core-mm PT locking Message-ID: References: <20240731122103.382509-1-david@redhat.com> <541f6c23-77ad-4d46-a8ed-fb18c9b635b3@redhat.com> MIME-Version: 1.0 In-Reply-To: <541f6c23-77ad-4d46-a8ed-fb18c9b635b3@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F13AE14000D X-Stat-Signature: 7hd86bgr9kfm653chrayp1gn6idey5w7 X-HE-Tag: 1722520355-808006 X-HE-Meta: U2FsdGVkX1+UIgUUEkk+uKqFvc6sXIbrkTIsq3gLT9oMMONy155FVU4kw/OCZyhLq2OBRXdx0sC2sognUv3j5A2gclqMRHcRF39nMP+6Hz/9AUliq2n2GYTNde+MuzTQCgODIrg49z8KAZ8em3QEw2VfTNwP806fQMAyGHz83pC5rpwvIetwRKsgimuTf/e72P3oiRq/85aAj6KZ3ooslgEvH3T7gOvO0I/leyQnhygQ/f9T53Mh1dDU83em92iejUrvf2v25ZkcQy1FM86F497sVvn95bnM4/X6rrO2d3/xE43VFfB29ZTBrc1WCwYM+uoP8XRmzMruLzxhiFFVmFyYPFyxbDEPAR/3tYSLHnRsOLdxkG3Q3p9tYCPXrW8EnWl9LkWWjKlEq6Tt12o8QPssrUvtpzWsyCeR/dkUuL6GsZF3H5s+D1JMLEJZImpJ+huk81Ww38We1bFWXU3XR+covXxH7EaDureFaWpzSUC8yJAMAAd5SOgIYiNrUyaTchDLfHdF0jVHlCzLHg+Q75MtxitI86GOPi2qwUEq+va6yvO1ItrAUoynCwddU6C799v3UBGoym2BnmFhV7c0TpnXVXyNRCN6n707LSsxnawDXiMDh4415WHhVMJIZ5cf4dvdOi3TJjsvES97bTV0DvhkI16xk+YMCGdsTi0CZ8cvBnrAQ1iPzKBAmF4ShQwPW0k05S0xRISrdAwdH8AMTWOJgGg3pMe/U1QrA6cl35/hHTu67TnfjH5K1byElpqqbXOAXph6Akv4PpJ/6zPrOmbfg5mtE/ko2uz/yMs6nVlBrqRe+2H5bKxpP5MfKlg7bs5/U5qi4dHXx142JBe56PsrITvrRumlG+JgO4V94exqbh1sUMj3DD7V2hO+C+HbnWek/BlQxhxxC90g/ULqcLK98ol9B/zD0wHirAYEA/mcuEvwxM0VtxLoHRI+GhKRf3K1/2kyKGoyatkcc/4 AGnd1CZL NhjWvJ7Ty892mnSbISa9L8Du+nP8RawY1o/pR9+TIai4wxWeIv/8MprpbR9pwEZnAxzPbwG+xRuenNTXmLtnnRHvD8tKnQG+I1xyAIH2x8Z67F0ZkFd9dC7xDAJLjSPI1Z8kAExVge+fWO7HaeJjgbSgLydnxdeuxiFao7oQVtk6dwhhdWrYH2ONspwjLS4TILK03pnrgcp4xZR+AE/6wf1Y8B+wuxodD4x4GgmUexocIBAKiq6EXlUqykr/X4MB9ZY2uHu1/cgp+DLK6VOoYHhTf2biFMTsKhhhVWiwt7z/KKC9YO9bFPqKAd5wc+AW2K4oUHJ/513ttKHgar0Ctrj4CYXsIfG+QL7xNCD32la2+AiPstOKSgzLDt1WoXBLxKkcPB6S3ZcjDwqb+5eGkdEZTEUKhg+IOx9q5FchLh+UOmvtWxHdhXg29AA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 01, 2024 at 10:50:18AM +0200, David Hildenbrand wrote: > On 31.07.24 14:21, David Hildenbrand wrote: > > We recently made GUP's common page table walking code to also walk hugetlb > > VMAs without most hugetlb special-casing, preparing for the future of > > having less hugetlb-specific page table walking code in the codebase. > > Turns out that we missed one page table locking detail: page table locking > > for hugetlb folios that are not mapped using a single PMD/PUD. > > James, Peter, > > the following seems to get the job done. Thoughts? OK to me, so my A-b can keep, but let me still comment; again, all nitpicks. > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 8e462205400d..776dc3914d9e 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -938,10 +938,40 @@ static inline bool htlb_allow_alloc_fallback(int reason) > static inline spinlock_t *huge_pte_lockptr(struct hstate *h, > struct mm_struct *mm, pte_t *pte) > { > - if (huge_page_size(h) == PMD_SIZE) > + unsigned long size = huge_page_size(h); > + > + VM_WARN_ON(size == PAGE_SIZE); > + > + /* > + * hugetlb must use the exact same PT locks as core-mm page table > + * walkers would. When modifying a PTE table, hugetlb must take the > + * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD > + * PT lock etc. > + * > + * The expectation is that any hugetlb folio smaller than a PMD is > + * always mapped into a single PTE table and that any hugetlb folio > + * smaller than a PUD (but at least as big as a PMD) is always mapped > + * into a single PMD table. > + * > + * If that does not hold for an architecture, then that architecture > + * must disable split PT locks such that all *_lockptr() functions > + * will give us the same result: the per-MM PT lock. > + * > + * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where > + * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use the MM PT lock > + * directly with a PMD hugetlb size, whereby core-mm would call > + * pmd_lockptr() instead. However, in such configurations split PMD > + * locks are disabled -- split locks don't make sense on a single > + * PGDIR page table -- and the end result is the same. > + */ > + if (size >= P4D_SIZE) > + return &mm->page_table_lock; I'd drop this so the mm lock fallback will be done below (especially in reality the pud lock is always mm lock for now..). Also this line reads like there can be P4D size huge page but in reality PUD is the largest (nopxx doesn't count). We also same some cycles in most cases if removed. > + else if (size >= PUD_SIZE) > + return pud_lockptr(mm, (pud_t *) pte); > + else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE)) I thought this HIGHPTE can also be dropped? Because in HIGHPTE it should never have lower-than-PMD huge pages or we're in trouble. That's why I kept one WARN_ON() in my pesudo code but only before trying to take the pte lockptr. > return pmd_lockptr(mm, (pmd_t *) pte); > - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); > - return &mm->page_table_lock; > + /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */ > + return ptep_lockptr(mm, pte); > } > #ifndef hugepages_supported > diff --git a/include/linux/mm.h b/include/linux/mm.h > index a890a1731c14..bd219ac9c026 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2869,6 +2869,13 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) > return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); > } > +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte) > +{ > + BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE)); > + BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE); > + return ptlock_ptr(virt_to_ptdesc(pte)); > +} Great to know we can drop the mask.. Thanks, > + > static inline bool ptlock_init(struct ptdesc *ptdesc) > { > /* > @@ -2893,6 +2900,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) > { > return &mm->page_table_lock; > } > +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte) > +{ > + return &mm->page_table_lock; > +} > static inline void ptlock_cache_init(void) {} > static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; } > static inline void ptlock_free(struct ptdesc *ptdesc) {} > -- > 2.45.2 > > > -- > Cheers, > > David / dhildenb > -- Peter Xu