From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D354D2BB09 for ; Tue, 22 Oct 2024 09:15:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729588512; cv=none; b=uXJkHQuXEmg4lAB6GHsg5GssSQkAPHSIc+K9OAPWF4kx4y0bxcL4r6AxE0A3/werHaO8nvXsHLzc4Mw6F0l5hnQIG2nfSicICTPqIHGjGngVclvq6EY1CtMy29UhbKfSe1p6ycUD1dbM9Xe8aLPotNrcn48ybGXYhX0pkySUGUI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729588512; c=relaxed/simple; bh=HZAPK3mJmTS0q5Rxj8HcfPNEVjwwJ+YQc63qBqyzdx4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R1TDb3FHx89+6Vvn4CnAo+KhGoUD3KkE1NQ/YnqGnTYJp4mxwRNhyrV+rBwzanGQSWcBLUl9VjXo9SWFyYvI10faiCYMfq2bddrgZLu8Ad07kYHxikP0OBA+Rt3BshAUmg0qOFc7zSVdMDOkkSqphUj0ZcXT/84lpbUvR1o63Pk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=V2z4YYF9; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="V2z4YYF9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1729588508; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z4sUBp/YGky6H3ltSPc3JifcYG/XqztgIGk696naMPU=; b=V2z4YYF9VzhgcQMuQciisKDIQKSE+DIvwydkBREaiS6DatTCJ911tVXoH1wmwEHOgT61j8 +riw9G5zHULTorgZw5b03zmDS7f1dgWpjws3HkNtYtHkovbsb/kh9GSMpcmm8Z2OlXRZWI gxMjjyU5k7oKHaaUPCcRIGGN92zI1q8= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-UM8jXB04O4eKXwBriehxYQ-1; Tue, 22 Oct 2024 05:15:05 -0400 X-MC-Unique: UM8jXB04O4eKXwBriehxYQ-1 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B8B8F19560BE; Tue, 22 Oct 2024 09:15:03 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.22.88.114]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C6CBF1955F42; Tue, 22 Oct 2024 09:14:59 +0000 (UTC) From: David Hildenbrand To: stable@vger.kernel.org Cc: Kefeng Wang , David Hildenbrand , Leo Fu , Thomas Huth , Ryan Roberts , Christian Borntraeger , Claudio Imbrenda , Hugh Dickins , Janosch Frank , Matthew Wilcox Subject: [PATCH 6.11.y] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() Date: Tue, 22 Oct 2024 11:14:58 +0200 Message-ID: <20241022091458.4111085-1-david@redhat.com> In-Reply-To: <2024101835-eloquent-could-27ce@gregkh> References: <2024101835-eloquent-could-27ce@gregkh> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 From: Kefeng Wang Patch series "mm: don't install PMD mappings when THPs are disabled by the hw/process/vma". During testing, it was found that we can get PMD mappings in processes where THP (and more precisely, PMD mappings) are supposed to be disabled. While it works as expected for anon+shmem, the pagecache is the problematic bit. For s390 KVM this currently means that a VM backed by a file located on filesystem with large folio support can crash when KVM tries accessing the problematic page, because the readahead logic might decide to use a PMD-sized THP and faulting it into the page tables will install a PMD mapping, something that s390 KVM cannot tolerate. This might also be a problem with HW that does not support PMD mappings, but I did not try reproducing it. Fix it by respecting the ways to disable THPs when deciding whether we can install a PMD mapping. khugepaged should already be taking care of not collapsing if THPs are effectively disabled for the hw/process/vma. This patch (of 2): Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by shmem_allowable_huge_orders() and __thp_vma_allowable_orders(). [david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ] Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang Signed-off-by: David Hildenbrand Reported-by: Leo Fu Tested-by: Thomas Huth Reviewed-by: Ryan Roberts Cc: Boqiao Fu Cc: Christian Borntraeger Cc: Claudio Imbrenda Cc: Hugh Dickins Cc: Janosch Frank Cc: Matthew Wilcox Cc: Signed-off-by: Andrew Morton (cherry picked from commit 963756aac1f011d904ddd9548ae82286d3a91f96) Signed-off-by: David Hildenbrand --- Only contextual differences in shmem_allowable_huge_orders(). Note that this patch is required to backport the fix 2b0f922323ccfa76219bcaacd35cd50aeaa13592, which can be cleanly cherry picked on top. --- include/linux/huge_mm.h | 18 ++++++++++++++++++ mm/huge_memory.c | 13 +------------ mm/shmem.c | 7 +------ 3 files changed, 20 insertions(+), 18 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e25d9ebfdf89..6d334c211176 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -308,6 +308,24 @@ static inline void count_mthp_stat(int order, enum mthp_stat_item item) (transparent_hugepage_flags & \ (1<vm_mm->flags); +} + +static inline bool thp_disabled_by_hw(void) +{ + /* If the hardware/firmware marked hugepage support disabled. */ + return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED); +} + unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 99b146d16a18..1536421f76d4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -106,18 +106,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, if (!vma->vm_mm) /* vdso */ return 0; - /* - * Explicitly disabled through madvise or prctl, or some - * architectures may disable THP for some mappings, for - * example, s390 kvm. - * */ - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return 0; - /* - * If the hardware/firmware marked hugepage support disabled. - */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) return 0; /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/shmem.c b/mm/shmem.c index 5a77acf6ac6a..2e21e06565ef 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1632,12 +1632,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, loff_t i_size; int order; - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return 0; - - /* If the hardware/firmware marked hugepage support disabled. */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) return 0; /* -- 2.46.1