From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 086E019D8A4 for ; Tue, 10 Sep 2024 17:42:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725990144; cv=none; b=mZcO1r+Z+kITFLjsaOS2bSDGZ3EsndDjg0xoqNCqU7KxxEBHHRUfftOMjKCvr3VNQeDkmCW5OPIYgNtzLhpglgLdyOnchxmzHU9a/d0EJ4cs6eQxBQLxEMhxtI50bVN0+4/XkwG2ZFJa6xN294Gg3U8StZVjwWth2vnZGFfuKSs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725990144; c=relaxed/simple; bh=u25THlA+Z7bzJGAEhSU19gr82UmAL417zQhjoh7buU8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PaNVPH5ZlvGyknA0FkZPAQb5ar2zRQ/r60SCajMkux+aahMNioYye2UL6bD3oc+nqLK3KucNy//xc6vIo+8Vl4k5lJeTuUc5RwStRyFLWUK6RXluBdbmS/u0kuxWNPaKbjJhm76VmA/Wop/qhkM6Yyg3gdd2+qoKlYkKT9WEo88= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=irA5/AfL; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="irA5/AfL" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-7174c6cbdbaso7579968b3a.2 for ; Tue, 10 Sep 2024 10:42:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725990142; x=1726594942; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=gg6skOg0GP1/0D73Nu093tI0yLfhQmox3hveal8dqPA=; b=irA5/AfLBypu8MgsAKfmqwHlY+Othsn7MZE2kKqwSUlQNzTOJwtl0biaMgL5yaJo7K JVg0bCxikMSQ+ik5VFazU4uyEFkWr5cyYgO4Mi4xjcSqncHWdzRnbDSGo0OHEwzItKNG n3SfVsbSu/hHpdQSNswXEWSeT1wpx6BFYh9d40FbHAHQbD/7YPsJ5nfbpNgMd32bsBmg OtoYnWauRMBBYgfHGTKZalbwhxXnF/kUqkL+/qw8YdwYN/4f700e3KkdqYcl4TZJi361 HCY6TfuAKUqVHFSMaoBLsLfaWWsWnEcT5/Wr/3NS2ToIMkm6eEcJChezGLQFbU/kZ7Fi v9cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725990142; x=1726594942; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=gg6skOg0GP1/0D73Nu093tI0yLfhQmox3hveal8dqPA=; b=ka4/CHbjAn5EGEMiSMRRTTevjSzHTu0zqk0KyiUo8UahzKSzOQy4WTtJtcpk5fip7V o/4k8DXHkXG9a4P0WtQhAgiVH7Lz5wDyyNZhDDDqiopjv+YPgrow+8VhpZ7YQbmt/sEn wsuQQnrfglCit7DYIv9FssECD92hZBcKPbAwb/w0TgqPQdybu6ylGe2lLNIBp7gv+u0G wyFMt1/a+6ppstYsrwC+b2zjJQCZJ5UY6qXJCn1z3HdmlalYIzf2QKX2EAdhHJbNYQqZ 9YaZhWH4XbzbjycYXPRfndumHHnsvAQ2D5umi+RJf/vw+zOHS1f/afvJ44Ki5FiahdAX PtGA== X-Forwarded-Encrypted: i=1; AJvYcCWTrC49D67afuwIV9QGhVyUAZc5Zd7tYi7EamTQUsrwnvRnGNHSJtck0bIICRy6NbeYN1Rc1+Y0r/kZEzU=@vger.kernel.org X-Gm-Message-State: AOJu0Yy3m7wJnZHpuTnYAYDJTovM1xdakucgfAxBsFwPyG0q4bebjJSv EoJC2bDfwvju0PyXCvek04w4qnyGhhUURgW3xaUdXmrtejPhr2mU28KCGgBXaydjCwj7YgWLB66 gdA== X-Google-Smtp-Source: AGHT+IE8OvwIv/WWjiojX1BJ9/jZfPx7CjqCmsLFPOIiZ8H3mbzDo/1m95EORYptPvhzt4+2cVR8hRJXts0= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:aa7:9182:0:b0:710:4d6c:a504 with SMTP id d2e1a72fcca58-718d5f45b47mr57116b3a.4.1725990142098; Tue, 10 Sep 2024 10:42:22 -0700 (PDT) Date: Tue, 10 Sep 2024 10:42:20 -0700 In-Reply-To: <655170f6a09ad892200cd033efe5498a26504fec.camel@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> <20240904030751.117579-10-rick.p.edgecombe@intel.com> <6449047b-2783-46e1-b2a9-2043d192824c@redhat.com> <1bbe3a78-8746-4db9-a96c-9dc5f1190f16@redhat.com> <655170f6a09ad892200cd033efe5498a26504fec.camel@intel.com> Message-ID: Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT From: Sean Christopherson To: Rick P Edgecombe Cc: "pbonzini@redhat.com" , Yuan Yao , Kai Huang , "isaku.yamahata@gmail.com" , "linux-kernel@vger.kernel.org" , Yan Y Zhao , "kvm@vger.kernel.org" , "dmatlack@google.com" , "nik.borisov@suse.com" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Sep 10, 2024, Rick P Edgecombe wrote: > On Tue, 2024-09-10 at 08:57 -0700, Sean Christopherson wrote: > > > Only if the TDX module returns BUSY per-SPTE (as suggested by 18.1.3, > > > which documents that the TDX module returns TDX_OPERAND_BUSY on a > > > CMPXCHG failure). If it returns BUSY per-VM, FROZEN_SPTE is not enoug= h > > > to prevent contention in the TDX module. > >=20 > > Looking at the TDX module code, things like (UN)BLOCK and REMOVE take a= per-VM > > lock in write mode, but ADD, AUG, and PROMOTE/DEMOTE take the lock in r= ead > > mode. >=20 > AUG does take other locks as exclusive: > https://github.com/intel/tdx-module/blob/tdx_1.5/src/vmm_dispatcher/api_c= alls/tdh_mem_page_aug.c Only a lock on the underlying physical page. guest_memfd should prevent ma= pping the same HPA into multiple GPAs, and FROZEN_SPTE should prevent two vCPUs f= rom concurrently AUGing the same GPA+HPA. > I count 5 locks in total as well. I think trying to mirror the locking in= KVM > will be an uphill battle. I don't want to mirror the locking, I want to understand and document the expectations and rules. "Throw 16 noodles and hope one sticks" is not a re= cipe for success. > > So for the operations that KVM can do in parallel, the locking should > > effectively > > be per-entry.=C2=A0 Because KVM will never throw away an entire S-EPT r= oot, zapping > > SPTEs will need to be done while holding mmu_lock for write, i.e. KVM > > shouldn't > > have problems with host tasks competing for the TDX module's VM-wide lo= ck. > >=20 > > > If we want to be a bit more optimistic, let's do something more > > > sophisticated, like only take the lock after the first busy reply. Bu= t > > > the spinlock is the easiest way to completely remove host-induced > > > TDX_OPERAND_BUSY, and only have to deal with guest-induced ones. > >=20 > > I am not convinced that's necessary or a good idea.=C2=A0 I worry that = doing so > > would > > just kick the can down the road, and potentially make the problems hard= er to > > solve, > > e.g. because we'd have to worry about regressing existing setups. > >=20 > > > > > It is still kinda bad that guests can force the VMM to loop, but = the VMM > > > > > can > > > > > always say enough is enough.=C2=A0 In other words, let's assume t= hat a limit > > > > > of > > > > > 16 is probably appropriate but we can also increase the limit and= crash > > > > > the > > > > > VM if things become ridiculous. > > > >=20 > > > > 2 :-) > > > >=20 > > > > One try that guarantees no other host task is accessing the S-EPT e= ntry, > > > > and a > > > > second try after blasting IPI to kick vCPUs to ensure no guest-side= task > > > > has > > > > locked the S-EPT entry. > > >=20 > > > Fair enough. Though in principle it is possible to race and have the > > > vCPU re-run and re-issue a TDG call before KVM re-issues the TDH call= . > >=20 > > My limit of '2' is predicated on the lock being a "host priority" lock, > > i.e. that kicking vCPUs would ensure the lock has been dropped and can= 't > > be re-acquired by the guest. >=20 > So kicking would be to try to break loose any deadlock we encountered? It= sounds > like the kind of kludge that could be hard to remove. No, the intent of the kick would be to wait for vCPUs to exit, which in tur= n guarantees that any locks held by vCPUs have been dropped. Again, this ide= a is predicated on the lock being "host priority", i.e. that vCPUs can't re-take= the lock before KVM.