From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96C2D1EC002 for ; Tue, 3 Sep 2024 21:27:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725398843; cv=none; b=XKctXvcTIaoAU5/ZM+oosfpMHorCPWXTTf6dlx/SOq9XEcRWoHLZ1KOdgHw/4GwC63PoA6/gBRLu8AFt3hj3QHA13y3ip8LQsbNiNtNGz0xI8TpBYxqniPHuwALpZE64PZHVmoebvNOx37ZcO3lLN4f6Vzdx7y1AtN+pS4qNUNQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725398843; c=relaxed/simple; bh=qrEVV+atCP7YQHWLRovEDChqJnEVAxmFlKh54HDJg74=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JTD6/NiK3l5l5rN+3uIRsdyS9b4WMRb0uvyufOkkzz8FkbmCM9Wxc7i8k/aK+q8OqslTk597zmuDf811LKfQyBrZlAAwTSDdgNJDiFreDbgyDwPbgBdHVum9+fya+0njOpe6m89AuGwG4ejcD6Wiu8cDufQySsehvHgkhFigGUk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=cR9wtbzA; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="cR9wtbzA" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-6818fa37eecso5876726a12.1 for ; Tue, 03 Sep 2024 14:27:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725398841; x=1726003641; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=IR07rgwQIaWEPj4YMn8O/ZcaLcUIXupvfxHt9XUMIM0=; b=cR9wtbzAkfq0KN+2a8SIFguzGIGC7yRb19+edEMmhwjyYdDr5X603HJSDgHddjL149 qIENrZdbTfIE+yV8o79DXwjzTRsOItZP4sixQTW8FDnZyysJIj+rROyEQkHRw5Jg6Cbv iwzx2MEVUvMpogmrXQ0Wf71GOgRPQcg2ljMljVlnd8NXvE4VK5CY1YkJ4zJJ0tj2BmNG BTgO+RbS2276S7RLr78h/+4yCbtjPKYJMjWIxpz3jK8iPyvG7m7EEyWCjFJFyW8HDsV9 Gcw0So9jXvwDx+IBqHzath3SBqoais1JV4Lt0qfuBNLfhBZ7ZSNsLoxK9OrySQqz/VMl 8mgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725398841; x=1726003641; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=IR07rgwQIaWEPj4YMn8O/ZcaLcUIXupvfxHt9XUMIM0=; b=dRmDWhO+NbbK/DIshdxWSiiW9drrat/aehCJURl1Ud7j0q6VcjUKUsYrgRoFU2NaI/ QocmeqyC4OQ6MobLyo/lFkpCpxGP2E9YLBqbz9WrgPgiiQeIwhsBUwzL27hIklWlptM5 LGKM9/sqibzCWnO0eIhnDATuUuhk3J15RCpPuCruf8XgnqA6dEcHTD6GxovGH5TSOmY9 Js2ydEx+7W/GwwGxPVMKy/gkfsAUJiIe0vihBsugG0ezhwE2XAXEp2g/rGvh+NACX7Uz 0F8ELy8AWG5+GFfLWHJ7zZwx3pxOIhAroRAQxkP/QyGBAJeTVBrELnxD8snau4FFkP5/ WuDQ== X-Forwarded-Encrypted: i=1; AJvYcCXzl+KIXaoa+waJTX5CGAojQQ0ajM15YrserDp1wP+ZhRrtBzF24NM3NGHenLNm+3A8o4Urzsu+St9z4o8=@vger.kernel.org X-Gm-Message-State: AOJu0YyJD5Nb9PNMIfkgEdjeXzb4CVStrdLjlmgO9M7wUq4B4QgDXvdq mf8T6Pd8JDi/8QLagGWgn4txV66VLN7UaC04X58OjlcPZPJ3wRkvxW7cwp+9587p5AUuUVEY2SW +0g== X-Google-Smtp-Source: AGHT+IHGGObdt+nrbGLN/+Yh5AuDmiJz4qFZ0q/RoulTdosTQa20/tiyRdaw1DNIz9LuC/N23WQmeX8QqMA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a02:595:b0:6e4:9469:9e3c with SMTP id 41be03b00d2f7-7d4aa728a2emr31302a12.0.1725398840594; Tue, 03 Sep 2024 14:27:20 -0700 (PDT) Date: Tue, 3 Sep 2024 14:27:19 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240809194335.1726916-1-seanjc@google.com> <20240809194335.1726916-20-seanjc@google.com> Message-ID: Subject: Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock From: Sean Christopherson To: James Houghton Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Oliver Upton , Marc Zyngier , Peter Xu Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Sep 03, 2024, James Houghton wrote: > On Fri, Aug 9, 2024 at 12:44=E2=80=AFPM Sean Christopherson wrote: > > +/* > > + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MM= U always > > + * operates with mmu_lock held for write), but rmaps can be walked wit= hout > > + * holding mmu_lock so long as the caller can tolerate SPTEs in the rm= ap chain > > + * being zapped/dropped _while the rmap is locked_. > > + * > > + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries = must be > > + * done while holding mmu_lock for write. This allows a task walking = rmaps > > + * without holding mmu_lock to concurrently walk the same entries as a= task > > + * that is holding mmu_lock but _not_ the rmap lock. Neither task wil= l modify > > + * the rmaps, thus the walks are stable. > > + * > > + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP= _LOCKED, > > + * only the rmap chains themselves are protected. E.g. holding an rma= p's lock > > + * ensures all "struct pte_list_desc" fields are stable. > > + */ > > +#define KVM_RMAP_LOCKED BIT(1) > > + > > +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) > > +{ > > + unsigned long old_val, new_val; > > + > > + old_val =3D READ_ONCE(rmap_head->val); > > + if (!old_val) > > + return 0; >=20 > I'm having trouble understanding how this bit works. What exactly is > stopping the rmap from being populated while we have it "locked"? Nothing prevents the 0=3D>1 transition, but that's a-ok because walking rma= ps for aging only cares about existing mappings. The key to correctness is that t= his helper returns '0' when there are no rmaps, i.e. the caller is guaranteed t= o do nothing and thus will never see any rmaps that come along in the future. > aren't holding the MMU lock at all in the lockless case, and given > this bit, it is impossible (I think?) for the MMU-write-lock-holding, > rmap-modifying side to tell that this rmap is locked. >=20 > Concretely, my immediate concern is that we will still unconditionally > write 0 back at unlock time even if the value has changed. The "readonly" unlocker (not added in this patch) is a nop if the rmap was = empty, i.e. wasn't actually locked. +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, + unsigned long old_val) +{ + if (!old_val) + return; + + KVM_MMU_WARN_ON(old_val !=3D (rmap_head->val & ~KVM_RMAP_LOCKED)); + WRITE_ONCE(rmap_head->val, old_val); +} The TODO in kvm_rmap_lock() pretty much sums things up: it's safe to call t= he "normal", non-readonly versions if and only if mmu_lock is held for write. +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +{ + /* + * TODO: Plumb in @kvm and add a lockdep assertion that mmu_lock is + * held for write. + */ + return __kvm_rmap_lock(rmap_head); +}