From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 1D994C3DA4A
	for <linux-arm-kernel@archiver.kernel.org>; Thu, 15 Aug 2024 00:24:37 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help
	:List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From:
	Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	bh=na+DQAropa9HhQkX0n6COE58Sm+QVzbS54Fw3LBcwkg=; b=VbEGFGM6S1UA8zCcsVSO/G5670
	TFb945Q5O1ZGtnvs08ZsR8QCQEVL7ir4yG5Bn9XfDE921o8USRgIX5y1Uy2osLTVL1zS5QcYcs6XY
	66xwfI4PJ/UGSkHk3RWHzDmprnwN2tTzedVyUmJmsGwrx6M6S46PuEgB/sYwISkJ6eqsybE0hQadl
	NnTmEzJFTUYSIonq+Oszl/CTDlzNRnsBc85c7PY1qHW5UmZN6Gn13CZ0T8ZbOJbysF7sMjuQsSufr
	+dXHhXdb4KEDaFQBV+lTJ9lk400xq6d3SkZIL2rg9DF2pceJG5giUTiuFQX0vdYNXg3aFaCMHGhN0
	NxWJDRoA==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1seOHr-00000008dVF-2Pc7;
	Thu, 15 Aug 2024 00:24:19 +0000
Received: from mail-pf1-x449.google.com ([2607:f8b0:4864:20::449])
	by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux))
	id 1seOHD-00000008dP7-2xpv
	for linux-arm-kernel@lists.infradead.org;
	Thu, 15 Aug 2024 00:23:41 +0000
Received: by mail-pf1-x449.google.com with SMTP id d2e1a72fcca58-70d1a9bad5dso370688b3a.0
        for <linux-arm-kernel@lists.infradead.org>; Wed, 14 Aug 2024 17:23:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1723681418; x=1724286218; darn=lists.infradead.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=na+DQAropa9HhQkX0n6COE58Sm+QVzbS54Fw3LBcwkg=;
        b=ngef+6/6l0B85Cm3jymXgsDIsdqAdvT2lKBQm+T8V14qPUTIJpMnBl1ABwB/GrXvCf
         sBoP4h2inLvBRt8PJ3Z46LmAWIOjN+XIgExXZ/BM4WHDqNcVZHAPB/9+Y8CnyPKqXOMv
         yZ1E/Op6VgmpL+rfDtdykQv+hU0VfINHt1sxoG/SYQnf5QUXWnkebIwx2msSOs+Ao6JV
         M7OHOIrfEFM35d/mbMopsMuYJ1kLgioZUzUxMModXOv15lnCqxzu3r9VmkHDLsRiMLuz
         v8znepWNvXQ/ZogpOnG3Ef9F4pZrur71XZ9NoMUIuHNaOQ4YTynb0saaSgPAKoxCcJTN
         Fe5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1723681418; x=1724286218;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=na+DQAropa9HhQkX0n6COE58Sm+QVzbS54Fw3LBcwkg=;
        b=IZxBVBlbjJQzYtQrgOFLJaQiZZMS9GfnxRiyk8KOx75vhku/IiTfWiBdtaLFGTisgt
         u0uYGjpSjCKJCl+jL6abcmRpOe/C3NPBTdSZx531ea5XD4V+o0Pp9zj6rpTr1P8bWcgP
         HhcLXFLwFT3oTo+wEu//H5q/Xzd1egUNTuB6FcYLooIETY98e0Dnuzcrp4gQZFMLAhyn
         1fYHC9Wxs8IDTo6IlNcvniY3VRxQCOVDOi0pT2dFq0JWNiX1mbf9CsnIbW+8EyfjoNj+
         pJChX0/wn+4l6QtRnvSg7y6wJzIITuDSXQHs37khwmE1LFdoZAMRx+378IiS5lvnGpjv
         OANw==
X-Forwarded-Encrypted: i=1; AJvYcCX+yNeQ7TIYbYjUAuN0B09fNVrZlkGJwtxo8naPcaCfwz3CezP/mwhxiuzk8SjkSIF8xtH2/GGE0tT15Azr9JKqGVp5p7n94pC+BOAUw/OloIOvDFs=
X-Gm-Message-State: AOJu0YzTOh4WajMK1K7PgkGsX0ejuVzN8zzMiUSXsqAj5yFp8uD5tskE
	LTJVFiGIurBXtLWx1aLJ+4g78ZnPJ297F5/ckoeVpTbagL24PW6faFqhNpNe088QlIqiHBgILPl
	c1g==
X-Google-Smtp-Source: AGHT+IGoZtQk6EhIPI9XMZuU6TS6huGs8UeaDStKkTJudv5DvPbQJzN1W9iOqNUU9Nrexb3O6V5rC1Lbce8=
X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:aa7:91c9:0:b0:710:4d06:93b3 with SMTP id
 d2e1a72fcca58-712673e5d8bmr25637b3a.3.1723681417634; Wed, 14 Aug 2024
 17:23:37 -0700 (PDT)
Date: Wed, 14 Aug 2024 17:23:36 -0700
In-Reply-To: <Zr0_5gixFGlyQMl7@linux.dev>
Mime-Version: 1.0
References: <20240809160909.1023470-1-peterx@redhat.com> <20240814123715.GB2032816@nvidia.com>
 <ZrzAlchCZx0ptSfR@google.com> <20240814144307.GP2032816@nvidia.com>
 <Zr0ZbPQHVNzmvwa6@google.com> <Zr09cyPZNShzeZc6@linux.dev> <Zr0_5gixFGlyQMl7@linux.dev>
Message-ID: <Zr1KiHO0pxPdYD_U@google.com>
Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps
From: Sean Christopherson <seanjc@google.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: Jason Gunthorpe <jgg@nvidia.com>, Peter Xu <peterx@redhat.com>, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org, Oscar Salvador <osalvador@suse.de>, 
	Axel Rasmussen <axelrasmussen@google.com>, linux-arm-kernel@lists.infradead.org, 
	x86@kernel.org, Will Deacon <will@kernel.org>, Gavin Shan <gshan@redhat.com>, 
	Paolo Bonzini <pbonzini@redhat.com>, Zi Yan <ziy@nvidia.com>, 
	Andrew Morton <akpm@linux-foundation.org>, Catalin Marinas <catalin.marinas@arm.com>, 
	Ingo Molnar <mingo@redhat.com>, Alistair Popple <apopple@nvidia.com>, Borislav Petkov <bp@alien8.de>, 
	David Hildenbrand <david@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, kvm@vger.kernel.org, 
	Dave Hansen <dave.hansen@linux.intel.com>, Alex Williamson <alex.williamson@redhat.com>, 
	Yan Zhao <yan.y.zhao@intel.com>, Marc Zyngier <maz@kernel.org>
Content-Type: text/plain; charset="us-ascii"
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240814_172339_770457_387528C8 
X-CRM114-Status: GOOD (  30.10  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Wed, Aug 14, 2024, Oliver Upton wrote:
> On Wed, Aug 14, 2024 at 04:28:00PM -0700, Oliver Upton wrote:
> > On Wed, Aug 14, 2024 at 01:54:04PM -0700, Sean Christopherson wrote:
> > > TL;DR: it's probably worth looking at mmu_stress_test (was: max_guest_memory_test)
> > > on arm64, specifically the mprotect() testcase[1], as performance is significantly
> > > worse compared to x86,
> > 
> > Sharing what we discussed offline:
> > 
> > Sean was using a machine w/o FEAT_FWB for this test, so the increased
> > runtime on arm64 is likely explained by the CMOs we're doing when
> > creating or invalidating a stage-2 PTE.
> > 
> > Using a machine w/ FEAT_FWB would be better for making these sort of
> > cross-architecture comparisons. Beyond CMOs, we do have some 
> 
> ... some heavy barriers (e.g. DSB(ishst)) we use to ensure page table
> updates are visible to the system. So there could still be some
> arch-specific quirks that'll show up in the test.

Nope, 'twas FWB.  On a system with FWB, ARM nicely outperforms x86 on mprotect()
when vCPUs stop on the first -EFAULT.  I suspect because ARM can do broadcast TLB
invalidations and doesn't need to interrupt and wait for every vCPU to respond.

  run1 = 10.723194154s, reset = 0.000014732s, run2 = 0.013790876s, ro = 2.151261587s, rw = 10.624272116s

However, having vCPUs continue faulting while mprotect() is running turns the
tables, I suspect due to mmap_lock

  run1 = 10.768003815s, reset = 0.000012051s, run2 = 0.013781921s, ro = 23.277624455s, rw = 10.649136889s

The x86 numbers since they're out of sight now:

 -EFAULT once
  run1 =  6.873408794s, reset = 0.000165898s, run2 = 0.035537803s, ro =  6.149083106s, rw = 7.713627355s

 -EFAULT forever
  run1 =  6.923218747s, reset = 0.000167050s, run2 = 0.034676225s, ro = 14.599445790s, rw = 7.763152792s

> > > and there might be bugs lurking the mmu_notifier flows.
> > 
> > Impossible! :)
> > 
> > > Jumping back to mmap_lock, adding a lock, vma_lookup(), and unlock in x86's page
> > > fault path for valid VMAs does introduce a performance regression, but only ~30%,
> > > not the ~6x jump from x86 to arm64.  So that too makes it unlikely taking mmap_lock
> > > is the main problem, though it's still good justification for avoid mmap_lock in
> > > the page fault path.
> > 
> > I'm curious how much of that 30% in a microbenchmark would translate to
> > real world performance, since it isn't *that* egregious.

vCPU jitter is the big problem, especially if userspace is doing something odd,
and/or if the kernel is preemptible (which also triggers yeild-on-contention logic
for spinlocks, ew).  E.g. the range-based retry to avoid spinning and waiting on
an unrelated MM operation was added by the ChromeOS folks[1] to resolve issues
where an MM operation got preempted and so blocked vCPU faults.

But even for cloud setups with a non-preemptible kernel, contending with unrelated
userspace VMM modification can be problematic, e.g. it turns out even the
gfn_to_pfn_cache logic needs range-based retry[2] (though that's a rather
pathological case where userspace is spamming madvise() to the point where vCPUs
can't even make forward progress).

> > We also have other uses for getting at the VMA beyond mapping granularity
> > (MTE and the VFIO Normal-NC hint) that'd require some attention too.

Yeah, though it seems like it'd be easy enough to take mmap_lock if and only if
it's necessary, e.g. similar to how common KVM takes it only if it encounters
VM_PFNMAP'd memory.

E.g. take mmap_lock if and only if MTE is active (I assume that's uncommon?), or
if the fault is to device memory.

[1] https://lore.kernel.org/all/20210222024522.1751719-1-stevensd@google.com
[2] https://lore.kernel.org/all/f862cefff2ed3f4211b69d785670f41667703cf3.camel@infradead.org