From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43C771BDC8 for ; Thu, 4 Apr 2024 17:10:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712250630; cv=none; b=fxvmBmCriYpwGkx/38G8IyhSfhQeCPjfNSIOZslHNbN2fPPVdbx1YVEzDroubbVEpokrEqdMbsNE6ujNHIVltgRhWEM+xBkgMCFZhGMmmbrh1H48E+PGp2EZp4mXaGhx+O+J+VvAnpEoFDkjVD04BDofUlzfcM0dLBPebgYZLQE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712250630; c=relaxed/simple; bh=x1mV6Asdbyk6MXGiOGkafVXcPu0WR/NTdzIgbd5Mxx4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=qaB8Ve2bWCfELh2loII7qwnW8SmNjFyCsMTrAgguUcjqXymub3x651p+cdmNAERU5Sqoye4GrcpDX34SqED+SGmrx87PnJszfTw6tD331yM9D4mb7Xu+XQH2PVjY3bOMgz6vpXA/A9fIPNHvHZQvKTJDltAxMYhvE77zmIFXMxU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LHhhpv9A; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LHhhpv9A" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-6eaf2e7a7a4so852798b3a.0 for ; Thu, 04 Apr 2024 10:10:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712250628; x=1712855428; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=sCYGJYoCCWmlbxATTiuo0TYVlf7iklFPBjPmL9dTFwA=; b=LHhhpv9A4DKntpnW/FAFwuZm88mFVUTanmvNQGLQ9Ld1A5ncfSm0nI+2s8tr4wuyZm eZpQJOt9sdGoPsrQ0hrsRIOS0irOhpLBfXqWQd1hghkFvhITi4PjQAlMTjVOPEx+A8FP AGryeglHbbeWW/ueMf1VkYrarsAZtMw7UjkF4ZLFCOs3QYjyykk+J980yiFSnM4KCXbn AsCdtM2m0HosAOeVjFGGwvnnjvlkvNpvafs9GaH770fsNEA8TNVlYbY+wG1ZvmWIjWWS uUBDScmkadlEcxOrramnqSPywY4U7FE+cF4IkPX9xcxP4l/SJpVknuOzuPBkYdhXLCmY gyLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712250628; x=1712855428; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=sCYGJYoCCWmlbxATTiuo0TYVlf7iklFPBjPmL9dTFwA=; b=goAsao7p17tZ46ZXAd5zYpGO2rTLJZGkKsTStmwN7Qi0elIEKHIKXBAlXD+ox7ZCAN +LNZBoO2PPbiSnUngmp1zjG7kAnBQ6mfRHyHACjs9BizRfb2hZ0bGCquvYmdPwWxRDe+ klH1w4y7T/axaI1Jhk3+H3H97XdXygT9jecBz+aflOSdE2seYs1lQoPo2tZWc+041Ffg 4xWW14zH0cSh0OPOZ2Gp/CvbjOt+oHnRpP2nbIyQzfVPocDnSHzoZd16kxHcj3A+wJC6 0BDR+YGIDU8pBlj1jWAqFtWUKYMTzKLDMcqpLc+8BIpNLSCDFS9sx6Nbp19uBBWv8kwA 3dxQ== X-Forwarded-Encrypted: i=1; AJvYcCXKUSJ6roep0g6vVaDCXxaae32A7looU67JIM1+dzCnhHCLq42uqt654OhgVXyONXycgGBjboVDafi3NViPxaqyocNw X-Gm-Message-State: AOJu0Yxahk5vvuqYMwhsohRnshKg2xni6b4uOUz0xDfy6378onYK3MOD SSsyyMQr6ZdXdxpdsP3KDZbXysOOh7CZT+mFsxajvw1Mk7bHuhkPbbAuCNrfZdjBlQcfsxEiBhG yAw== X-Google-Smtp-Source: AGHT+IHhPvXoZh4vQ3K5t6fKHgJX6vWfBjCuaUMfTTZTVLeIjVZLQFOKqC8uEwoprG9xUVSn42La6K8/le4= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1889:b0:6ec:f400:95a7 with SMTP id x9-20020a056a00188900b006ecf40095a7mr1349pfh.3.1712250628435; Thu, 04 Apr 2024 10:10:28 -0700 (PDT) Date: Thu, 4 Apr 2024 10:10:26 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240402213656.3068504-1-dmatlack@google.com> Message-ID: Subject: Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG From: Sean Christopherson To: David Matlack Cc: maobibo , Paolo Bonzini , kvm@vger.kernel.org, Marc Zyngier , Oliver Upton , Tianrui Zhao , Huacai Chen , Michael Ellerman , Anup Patel , Christian Borntraeger , Janosch Frank , Claudio Imbrenda Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Thu, Apr 04, 2024, David Matlack wrote: > On Tue, Apr 2, 2024 at 6:50=E2=80=AFPM maobibo wrot= e: > > > This change eliminates dips in guest performance during live migratio= n > > > in a 160 vCPU VM when userspace is issuing CLEAR ioctls (tested with > > > 1GiB and 8GiB CLEARs). Userspace could issue finer-grained CLEARs, wh= ich > > Frequently drop/reacquire mmu_lock will cause userspace migration > > process issuing CLEAR ioctls to contend with 160 vCPU, migration speed > > maybe become slower. >=20 > In practice we have not found this to be the case. With this patch > applied we see a significant improvement in guest workload throughput > while userspace is issuing CLEAR ioctls without any change to the > overall migration duration. ... > In the case of this patch, there doesn't seem to be a trade-off. We > see an improvement to vCPU performance without any regression in > migration duration or other metrics. For x86. We need to keep in mind that not all architectures have x86's opt= imization around dirty logging faults, or around faults in general. E.g. LoongArch's = (which I assume is Bibo Mao's primary interest) kvm_map_page_fast() still acquires= mmu_lock. And if the fault can't be handled in the fast path, KVM will actually acqui= re mmu_lock twice (mmu_lock is dropped after the fast-path, then reacquired af= ter the mmu_seq and fault-in pfn stuff). So for x86, I think we can comfortably state that this change is a net posi= tive for all scenarios. But for other architectures, that might not be the case= . I'm not saying this isn't a good change for other architectures, just that = we don't have relevant data to really know for sure. Absent performance data for other architectures, which is likely going to b= e difficult/slow to get, it might make sense to have this be opt-in to start.= We could even do it with minimal #ifdeffery, e.g. something like the below wou= ld allow x86 to do whatever locking it wants in kvm_arch_mmu_enable_log_dirty_pt_mas= ked() (I assume we want to give kvm_get_dirty_log_protect() similar treatment?). I don't love the idea of adding more arch specific MMU behavior (going the = wrong direction), but it doesn't seem like an unreasonable approach in this case. diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c index 86d267db87bb..5eb1ce83f29d 100644 --- a/virt/kvm/dirty_ring.c +++ b/virt/kvm/dirty_ring.c @@ -66,9 +66,9 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot= , u64 offset, u64 mask) if (!memslot || (offset + __fls(mask)) >=3D memslot->npages) return; =20 - KVM_MMU_LOCK(kvm); + KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm); kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset, mask)= ; - KVM_MMU_UNLOCK(kvm); + KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm); } =20 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d1fd9cb5d037..74ae844e4ed0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2279,7 +2279,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm,= struct kvm_dirty_log *log) dirty_bitmap_buffer =3D kvm_second_dirty_bitmap(memslot); memset(dirty_bitmap_buffer, 0, n); =20 - KVM_MMU_LOCK(kvm); + KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm); for (i =3D 0; i < n / sizeof(long); i++) { unsigned long mask; gfn_t offset; @@ -2295,7 +2295,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm,= struct kvm_dirty_log *log) kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslo= t, offset, mas= k); } - KVM_MMU_UNLOCK(kvm); + KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm); } =20 if (flush) @@ -2390,7 +2390,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kv= m, if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n)) return -EFAULT; =20 - KVM_MMU_LOCK(kvm); + KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT(kvm); for (offset =3D log->first_page, i =3D offset / BITS_PER_LONG, n =3D DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--; i++, offset +=3D BITS_PER_LONG) { @@ -2413,7 +2413,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kv= m, offset, mas= k); } } - KVM_MMU_UNLOCK(kvm); + KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT(kvm); =20 if (flush) kvm_flush_remote_tlbs_memslot(kvm, memslot); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index ecefc7ec51af..39d8b809c303 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -20,6 +20,11 @@ #define KVM_MMU_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) #endif /* KVM_HAVE_MMU_RWLOCK */ =20 +#ifndef KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT +#define KVM_MMU_LOCK_FOR_DIRTY_LOG_PROTECT KVM_MMU_LOCK +#define KVM_MMU_UNLOCK_FOR_DIRTY_LOG_PROTECT KVM_MMU_UNLOCK +#endif + kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, bool *async, bool write_fault, bool *writable); =20