From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 91010C3DA7F for ; Wed, 7 Aug 2024 19:49:45 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XGAWFOvG; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=PODobSHL; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4WfLNm1nN6z3dK3 for ; Thu, 8 Aug 2024 05:49:44 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XGAWFOvG; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=PODobSHL; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=peterx@redhat.com; receiver=lists.ozlabs.org) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4WfLMD4XFGz3d8K for ; Thu, 8 Aug 2024 05:48:24 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723060101; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6aC2RF+//p0GJfjgH/iZrlW9ATLJl4mVfe8Rdc4jHgk=; b=XGAWFOvG4zuw1lxUHs7LeSvHNCITA/C2mNJZXN9EGvTUgcye0ReTxjX7whZ71KtY/Usvhz NnogIvD+pOob5/EbH9RUIJFIl38Lsd3Ml01rQmDuaJhpiDvQbZsQizJ75e1oj1Hw4ZZR93 hXkSMymI5PhkkM15GGtLleUIE7AYeyk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723060102; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6aC2RF+//p0GJfjgH/iZrlW9ATLJl4mVfe8Rdc4jHgk=; b=PODobSHL9U/z+Itf4YhoZ/dBLtHitahnMQqNSlrQuYMJxxQV3svaChA4tUsbR5shpZUGVl V61dGf9mgUtQqPvg6kd/LCQFGjlhuORrmTmG2mvaL4D4SHvWKPUHuKL5S7uSqt1Uqri63b dhHB9kUdGkK16gkybykW/tZtX60DTmk= Received: from mail-ua1-f71.google.com (mail-ua1-f71.google.com [209.85.222.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-335-GvuIPZ8xNsSd5UoA1UA4vQ-1; Wed, 07 Aug 2024 15:48:20 -0400 X-MC-Unique: GvuIPZ8xNsSd5UoA1UA4vQ-1 Received: by mail-ua1-f71.google.com with SMTP id a1e0cc1a2514c-8257c3b4904so17715241.1 for ; Wed, 07 Aug 2024 12:48:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723060099; x=1723664899; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6aC2RF+//p0GJfjgH/iZrlW9ATLJl4mVfe8Rdc4jHgk=; b=na51piB9kETl2Rh+9qiu9wUVORoHiyt7wA0fdmqWXBR8rKZEFxzccJt5PpoMjhys+D dM4CFvfWkRvVWfyTXPnj4BXUIqI6hiMvRL42cbeHMjwk+cvhsf0ORH3yO+83eYgylmlC MU9UM9owzNjYFAtXcb0xm0xU+aQjOczYKQdKqsz+hN4qhEkoLfBUKKdxDPiOu1ZMRVXD /KPvZ4CYiCPYlAA59TJmh/z15AQjK42gI+pzaX9sdCXay8zpBGye7v6irbw7wHyu3BRh Hjzlb/XPuJU3JccDLe/nfuWQ73oxvIgEreYxhG/KUK5d9m1PFM9u+rcFnvZ1KWpZFXfw O+sQ== X-Forwarded-Encrypted: i=1; AJvYcCVk2CL+FRuPCngkxtgDvsjiJOl5uQWia8Hic8/LrE8g/ja9h3XQoT3ryfQXS5Ys+mkvV9Xm4r7ggUvlxr8o8F4U9bUrKJhOwXszIjfH4Q== X-Gm-Message-State: AOJu0YwbbPHsflkz5lNlOFDxYIH96bSX7GX+kzLUvDqHYs3D+OuyJl97 PqdABk21t1MAT8cZvdx8enHAII6ujQoDZWuafVE3Ffn4r9D8ULH6aLmyb4p8GXUHh6l3VnAhWOu DU2E35WU0h74oZt4YD2kLFtsEEYuqPci9xDj0X52i/Noo6LUqpRmDz4Fl4phIrYU= X-Received: by 2002:a67:f445:0:b0:493:c75f:4c71 with SMTP id ada2fe7eead31-4945bf7f9bemr10182699137.4.1723060099540; Wed, 07 Aug 2024 12:48:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IExnSrOtwcHXkdqrS3UwyseZPMUb/72Vd4NIKyn9I98RgoSLE2TM2KVGSHUSlSxMHs5MS4l7Q== X-Received: by 2002:a67:f445:0:b0:493:c75f:4c71 with SMTP id ada2fe7eead31-4945bf7f9bemr10182670137.4.1723060099151; Wed, 07 Aug 2024 12:48:19 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bb9c78ae4asm59853256d6.33.2024.08.07.12.48.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Aug 2024 12:48:18 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v4 2/7] mm/mprotect: Push mmu notifier to PUDs Date: Wed, 7 Aug 2024 15:48:06 -0400 Message-ID: <20240807194812.819412-3-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240807194812.819412-1-peterx@redhat.com> References: <20240807194812.819412-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: James Houghton , kvm@vger.kernel.org, David Hildenbrand , Dave Hansen , peterx@redhat.com, Christophe Leroy , Dan Williams , David Rientjes , Dave Jiang , "Aneesh Kumar K . V" , x86@kernel.org, Hugh Dickins , Matthew Wilcox , Ingo Molnar , Huang Ying , Rik van Riel , Sean Christopherson , Nicholas Piggin , Borislav Petkov , "Kirill A . Shutemov" , Thomas Gleixner , Vlastimil Babka , Oscar Salvador , linuxppc-dev@lists.ozlabs.org, Paolo Bonzini , Andrew Morton , Rick P Edgecombe , Mel Gorman Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" mprotect() does mmu notifiers in PMD levels. It's there since 2014 of commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to change_pmd_range"). At that time, the issue was that NUMA balancing can be applied on a huge range of VM memory, even if nothing was populated. The notification can be avoided in this case if no valid pmd detected, which includes either THP or a PTE pgtable page. Now to pave way for PUD handling, this isn't enough. We need to generate mmu notifications even on PUD entries properly. mprotect() is currently broken on PUD (e.g., one can easily trigger kernel error with dax 1G mappings already), this is the start to fix it. To fix that, this patch proposes to push such notifications to the PUD layers. There is risk on regressing the problem Rik wanted to resolve before, but I think it shouldn't really happen, and I still chose this solution because of a few reasons: 1) Consider a large VM that should definitely contain more than GBs of memory, it's highly likely that PUDs are also none. In this case there will have no regression. 2) KVM has evolved a lot over the years to get rid of rmap walks, which might be the major cause of the previous soft-lockup. At least TDP MMU already got rid of rmap as long as not nested (which should be the major use case, IIUC), then the TDP MMU pgtable walker will simply see empty VM pgtable (e.g. EPT on x86), the invalidation of a full empty region in most cases could be pretty fast now, comparing to 2014. 3) KVM has explicit code paths now to even give way for mmu notifiers just like this one, e.g. in commit d02c357e5bfa ("KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing"). It'll also avoid contentions that may also contribute to a soft-lockup. 4) Stick with PMD layer simply don't work when PUD is there... We need one way or another to fix PUD mappings on mprotect(). Pushing it to PUD should be the safest approach as of now, e.g. there's yet no sign of huge P4D coming on any known archs. Cc: kvm@vger.kernel.org Cc: Sean Christopherson Cc: Paolo Bonzini Cc: David Rientjes Cc: Rik van Riel Signed-off-by: Peter Xu --- mm/mprotect.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 37cf8d249405..d423080e6509 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -363,9 +363,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb, unsigned long next; long pages = 0; unsigned long nr_huge_updates = 0; - struct mmu_notifier_range range; - - range.start = 0; pmd = pmd_offset(pud, addr); do { @@ -383,14 +380,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb, if (pmd_none(*pmd)) goto next; - /* invoke the mmu notifier if the pmd is populated */ - if (!range.start) { - mmu_notifier_range_init(&range, - MMU_NOTIFY_PROTECTION_VMA, 0, - vma->vm_mm, addr, end); - mmu_notifier_invalidate_range_start(&range); - } - _pmd = pmdp_get_lockless(pmd); if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { if ((next - addr != HPAGE_PMD_SIZE) || @@ -431,9 +420,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb, cond_resched(); } while (pmd++, addr = next, addr != end); - if (range.start) - mmu_notifier_invalidate_range_end(&range); - if (nr_huge_updates) count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates); return pages; @@ -443,22 +429,36 @@ static inline long change_pud_range(struct mmu_gather *tlb, struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { + struct mmu_notifier_range range; pud_t *pud; unsigned long next; long pages = 0, ret; + range.start = 0; + pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); ret = change_prepare(vma, pud, pmd, addr, cp_flags); - if (ret) - return ret; + if (ret) { + pages = ret; + break; + } if (pud_none_or_clear_bad(pud)) continue; + if (!range.start) { + mmu_notifier_range_init(&range, + MMU_NOTIFY_PROTECTION_VMA, 0, + vma->vm_mm, addr, end); + mmu_notifier_invalidate_range_start(&range); + } pages += change_pmd_range(tlb, vma, pud, addr, next, newprot, cp_flags); } while (pud++, addr = next, addr != end); + if (range.start) + mmu_notifier_invalidate_range_end(&range); + return pages; } -- 2.45.0