From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id C6631C6FA90
	for <linux-arm-kernel@archiver.kernel.org>; Tue, 20 Sep 2022 19:03:29 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:
	Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=RNz8ksSizQSecQXJR3NaLzRZaZ9W2r7dzXC7rkB0ZZ0=; b=4Q3RfXXly+e20Y
	jpdKyQIGArKKTj2KV0o+j0atYFdvT+kQnLUeiQYz3IgH9zYtmS6hwObIVXPghq+moqcD4sUYLWz3G
	zIVMZIdBPNqOMBH2VhpsavYqqxbGQo7gdke82m917McxdwxQXCQNZcMQ/DTgcMis0ENcNdeTXhRln
	/0HHxvDy/lzDVv65v0o5NjzZCaipJotfT3ACuwKxV1JnptVh9ReCalfK1sxGME+l0XGINBpeZ7b2c
	fEjA7O4zZsybJzv+ESoXB54M/7F9SK2cB9e4rcI6xxDewHtBT98TknQgcIvt/xJnJQwhsxSipY6B9
	82HpTTrUGG3cZsN9P6Cw==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
	id 1oaiVj-005uBi-Hn; Tue, 20 Sep 2022 19:02:23 +0000
Received: from mail-pj1-x1034.google.com ([2607:f8b0:4864:20::1034])
	by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux))
	id 1oaiVf-005u8k-JT
	for linux-arm-kernel@lists.infradead.org; Tue, 20 Sep 2022 19:02:21 +0000
Received: by mail-pj1-x1034.google.com with SMTP id fv3so4129810pjb.0
        for <linux-arm-kernel@lists.infradead.org>; Tue, 20 Sep 2022 12:02:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date;
        bh=zXsYFShEmvH+EI5z5NbniOVEYQORgiVSZDqKxDhvsTg=;
        b=Lffk/END6ruKah3ztDyizudWLEaTUgszdAF8xCQ1ElJpNfgREB77JPhLkf3vpSzCeh
         1uu1EfAyGcG+9EeuzGq7EGFjVcZUgn04GRGC3Fs4OuTxZGmWKAi3B2aAsAfzayRxZQSP
         Jfnos5cnyAYkTxhiwXX726u58w5AWL+mA0Q14gtwGgtRvMO3IBhW0RVPwaPg+0vDz+/c
         jPoMQf8BfoeZZD5cowX5vcgPxot7oq6dMuIOHr6abAKz/sP7CtJ0xXwD+svR7qXMFGEz
         B6Cv9uEm9f+1jK+gZgD6PXQM+k1JmAzQ3nPH/X1ISugdBOJkhgERDGeBsGaOveyIUjTN
         8y4w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date;
        bh=zXsYFShEmvH+EI5z5NbniOVEYQORgiVSZDqKxDhvsTg=;
        b=qsObY19ua712KSieOFFDvIAeGIRYUEF05eCSX2WsLK4TC2AUinhMIJpq7ZOR+eBQjP
         G5JA042AYgHXlHdMbgxNmGSnKdEOmBhMDKvtM5bc9l7P73rVXBG4Lpbc/PGnFzTSijhB
         P/iDtExF19Ee80XqRhbCZit+Y+BlVU0Qj051KjwVBk+A1S21L+bWW8hCQGb0XFG/F9RL
         sah/FRUf5u8fcf3ZQNCP3kQU452qAqmXyK7OOzzIrORW7kH4dc30zgISUONgrZZUk2b1
         cw6yScIzvuqUYm0wj7ZgZeeJzZW/9zp6SqUxnyaqPdwyS00lhyPbZ/+EWvnC1VGyl4V7
         AW4g==
X-Gm-Message-State: ACrzQf1hkID+7VeQ5msJ9vIZyzISsUfFpoMHAXK9xB3mR9FNpRo91WnO
	huZkOwLkyh7pC4OX5EoSndlb0w==
X-Google-Smtp-Source: AMsMyM4FJa+v3T69iX/i1o2dWdY3jj7sHJmFQDsXflR0TYTaokYZ+dA8KaVYEVPMGv00SgxHCzu6EA==
X-Received: by 2002:a17:90b:4b09:b0:202:ad77:9ee1 with SMTP id lx9-20020a17090b4b0900b00202ad779ee1mr5425327pjb.10.1663700533691;
        Tue, 20 Sep 2022 12:02:13 -0700 (PDT)
Received: from google.com (220.181.82.34.bc.googleusercontent.com. [34.82.181.220])
        by smtp.gmail.com with ESMTPSA id e6-20020a17090a728600b001fe2ab750bbsm249332pjg.29.2022.09.20.12.02.11
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 20 Sep 2022 12:02:12 -0700 (PDT)
Date: Tue, 20 Sep 2022 12:02:08 -0700
From: Ricardo Koller <ricarkol@google.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
	James Morse <james.morse@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH] KVM: arm64: Limit stage2_apply_range() batch size to 1GB
Message-ID: <YyoOMOJxPJkxALtL@google.com>
References: <20220920183630.3376939-1-oliver.upton@linux.dev>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20220920183630.3376939-1-oliver.upton@linux.dev>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20220920_120219_672402_F10E2284 
X-CRM114-Status: GOOD (  32.41  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Tue, Sep 20, 2022 at 06:36:29PM +0000, Oliver Upton wrote:
> Presently stage2_apply_range() works on a batch of memory addressed by a
> stage 2 root table entry for the VM. Depending on the IPA limit of the
> VM and PAGE_SIZE of the host, this could address a massive range of
> memory. Some examples:
> 
>   4 level, 4K paging -> 512 GB batch size
> 
>   3 level, 64K paging -> 4TB batch size
> 
> Unsurprisingly, working on such a large range of memory can lead to soft
> lockups. When running dirty_log_perf_test:
> 
>   ./dirty_log_perf_test -m -2 -s anonymous_thp -b 4G -v 48
> 
>   watchdog: BUG: soft lockup - CPU#0 stuck for 45s! [dirty_log_perf_:16703]
>   Modules linked in: vfat fat cdc_ether usbnet mii xhci_pci xhci_hcd sha3_generic gq(O)
>   CPU: 0 PID: 16703 Comm: dirty_log_perf_ Tainted: G           O       6.0.0-smp-DEV #1
>   pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : dcache_clean_inval_poc+0x24/0x38
>   lr : clean_dcache_guest_page+0x28/0x4c
>   sp : ffff800021763990
>   pmr_save: 000000e0
>   x29: ffff800021763990 x28: 0000000000000005 x27: 0000000000000de0
>   x26: 0000000000000001 x25: 00400830b13bc77f x24: ffffad4f91ead9c0
>   x23: 0000000000000000 x22: ffff8000082ad9c8 x21: 0000fffafa7bc000
>   x20: ffffad4f9066ce50 x19: 0000000000000003 x18: ffffad4f92402000
>   x17: 000000000000011b x16: 000000000000011b x15: 0000000000000124
>   x14: ffff07ff8301d280 x13: 0000000000000000 x12: 00000000ffffffff
>   x11: 0000000000010001 x10: fffffc0000000000 x9 : ffffad4f9069e580
>   x8 : 000000000000000c x7 : 0000000000000000 x6 : 000000000000003f
>   x5 : ffff07ffa2076980 x4 : 0000000000000001 x3 : 000000000000003f
>   x2 : 0000000000000040 x1 : ffff0830313bd000 x0 : ffff0830313bcc40
>   Call trace:
>    dcache_clean_inval_poc+0x24/0x38
>    stage2_unmap_walker+0x138/0x1ec
>    __kvm_pgtable_walk+0x130/0x1d4
>    __kvm_pgtable_walk+0x170/0x1d4
>    __kvm_pgtable_walk+0x170/0x1d4
>    __kvm_pgtable_walk+0x170/0x1d4
>    kvm_pgtable_stage2_unmap+0xc4/0xf8
>    kvm_arch_flush_shadow_memslot+0xa4/0x10c
>    kvm_set_memslot+0xb8/0x454
>    __kvm_set_memory_region+0x194/0x244
>    kvm_vm_ioctl_set_memory_region+0x58/0x7c
>    kvm_vm_ioctl+0x49c/0x560
>    __arm64_sys_ioctl+0x9c/0xd4
>    invoke_syscall+0x4c/0x124
>    el0_svc_common+0xc8/0x194
>    do_el0_svc+0x38/0xc0
>    el0_svc+0x2c/0xa4
>    el0t_64_sync_handler+0x84/0xf0
>    el0t_64_sync+0x1a0/0x1a4
> 
> Given the various paging configurations used by KVM at stage 2 there
> isn't a sensible page table level to use as the batch size. Use 1GB as
> the batch size instead, as it is evenly divisible by all supported
> hugepage sizes across 4K, 16K, and 64K paging.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
> 
> Applies to 6.0-rc3. Tested with 4K and 64K pages with the above
> dirty_log_perf_test command and noticed no more soft lockups. I don't
> have a 16K system to test with.
> 
> Marc, we spoke about this a while ago and agreed to go for some page
> table level based batching scheme. However, I decided against that
> because it doesn't really solve the problem for non-4K kernels.
> 
>  arch/arm64/include/asm/stage2_pgtable.h | 20 --------------------
>  arch/arm64/kvm/mmu.c                    |  8 +++++++-
>  2 files changed, 7 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
> index fe341a6578c3..c8dca8ae359c 100644
> --- a/arch/arm64/include/asm/stage2_pgtable.h
> +++ b/arch/arm64/include/asm/stage2_pgtable.h
> @@ -10,13 +10,6 @@
>  
>  #include <linux/pgtable.h>
>  
> -/*
> - * PGDIR_SHIFT determines the size a top-level page table entry can map
> - * and depends on the number of levels in the page table. Compute the
> - * PGDIR_SHIFT for a given number of levels.
> - */
> -#define pt_levels_pgdir_shift(lvls)	ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - (lvls))
> -
>  /*
>   * The hardware supports concatenation of up to 16 tables at stage2 entry
>   * level and we use the feature whenever possible, which means we resolve 4
> @@ -30,11 +23,6 @@
>  #define stage2_pgtable_levels(ipa)	ARM64_HW_PGTABLE_LEVELS((ipa) - 4)
>  #define kvm_stage2_levels(kvm)		VTCR_EL2_LVLS(kvm->arch.vtcr)
>  
> -/* stage2_pgdir_shift() is the size mapped by top-level stage2 entry for the VM */
> -#define stage2_pgdir_shift(kvm)		pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
> -#define stage2_pgdir_size(kvm)		(1ULL << stage2_pgdir_shift(kvm))
> -#define stage2_pgdir_mask(kvm)		~(stage2_pgdir_size(kvm) - 1)
> -
>  /*
>   * kvm_mmmu_cache_min_pages() is the number of pages required to install
>   * a stage-2 translation. We pre-allocate the entry level page table at
> @@ -42,12 +30,4 @@
>   */
>  #define kvm_mmu_cache_min_pages(kvm)	(kvm_stage2_levels(kvm) - 1)
>  
> -static inline phys_addr_t
> -stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> -{
> -	phys_addr_t boundary = (addr + stage2_pgdir_size(kvm)) & stage2_pgdir_mask(kvm);
> -
> -	return (boundary - 1 < end - 1) ? boundary : end;
> -}
> -
>  #endif	/* __ARM64_S2_PGTABLE_H_ */
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c9a13e487187..d64032b9fbb6 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -31,6 +31,12 @@ static phys_addr_t hyp_idmap_vector;
>  
>  static unsigned long io_map_base;
>  
> +static inline phys_addr_t stage2_apply_range_next(phys_addr_t addr, phys_addr_t end)
> +{
> +	phys_addr_t boundary = addr + SZ_1G;

I think you want this to be aligned-down to 1G as well. At least
stage2_pgd_addr_end() was doing so, plus it reduces the number of
operations (e.g., when splitting a 1GB huge page when the address is
unaligned).

> +
> +	return (boundary - 1 < end - 1) ? boundary : end;
> +}
>  
>  /*
>   * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
> @@ -52,7 +58,7 @@ static int stage2_apply_range(struct kvm *kvm, phys_addr_t addr,
>  		if (!pgt)
>  			return -EINVAL;
>  
> -		next = stage2_pgd_addr_end(kvm, addr, end);
> +		next = stage2_apply_range_next(addr, end);
>  		ret = fn(pgt, addr, next - addr);
>  		if (ret)
>  			break;
> 
> base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
> -- 
> 2.37.3.968.ga6b4b080e4-goog
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel