From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4208A2765D2 for ; Wed, 4 Mar 2026 15:39:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772638769; cv=none; b=Mjous0TzGWgK6dulog2L2ce5b7ZJyC0XgcI4DKGMQozzmJFGxK0BBFRDOSsXy44yXhNlN2m769b+YRxgd8423zG7BVZi6q4YwhYqDE3YVPeGeLqZO9xE2wgNiQDTvxs5ZD7BEeYA1PPYOBS/eEoGbeK4hMSltYsTUnZMO40/R/4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772638769; c=relaxed/simple; bh=AwQySQ6RU/E0+QND+fYEQvAaI0LFcEIbgSecZ8phsbg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QJ8J4BgieiAbFmn1AsVBSE6Pa25FIp1WOlFy6zKYQSaOKW44f6LrVS+U6QofwyPPQc8YRdB94a/qf7Sg8MqtQmdtH7d3bYjue0k+U42eiSgV6tjezJEgHuHonXgUxIUvSxjExozrkeY2V49OzgEV+I65j6Aj8/JjK5IAJWMLNYw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DV6vy8a3; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DV6vy8a3" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c73781252edso2347944a12.0 for ; Wed, 04 Mar 2026 07:39:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772638766; x=1773243566; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4obgg0pL0Jwl8Tcok03QPH04x1vAD8j3dmd1EmRfQ2U=; b=DV6vy8a3GyhnChV3FWBgJJ4hOuL48fuVskECChhSD/EHyEtnN0fRrMPhjfW7yQQAbX Lt4PHlr5x/79aVmuNz1r0iBki0jWOpu5TC/w1a/QyMR6kwxfdMmW0T6h3Jc2FjUiRJjL 8EXP8RZbnj9vI8XeSTIqOOg2GaRO4UHdf3N2VOUknyZavpHaSjvv2Gs/MlZyTvZ4ElWZ cUyrZC4e9fhg23J2atB3S+7kcEHePv91QJysPclS88eeYCEl/NTahT2469B/oO8IK8/Q Uta4fEFBahKuxv8hpmLNbw9bZD+EzeaVRs0cX5zI3ZQhIKnc8Bea1tkd6ycIPrH3iQZK ODmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772638766; x=1773243566; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4obgg0pL0Jwl8Tcok03QPH04x1vAD8j3dmd1EmRfQ2U=; b=ok6RUkNVDuxwd3WUZ6XscbXDqHPR1mwM+uI2f9U4HIhyknIn3XRRXwWtgBIHtjNTKW GsOkjvxqHqEaUtwmHfkE2Yy7/jtW30JjAHCWhjRw2P+OmSdO6iL8+DJtQjKliTALSZem To+WvNBvm893oRL0X+FUkCrmN3Huvgbr0L1jHvvUOOzBS2Gebvv0frtIbRStb4PA/MKd RM1eok1I3qTOZAdU5Vg6za1Mr8UliEj2S80ZzT9cdN3/IhJf5CsJdMTqXiqKPxsVnXr7 NMaZHcKeomPysn3AnKvBb+UcU3skwba/VH+332qh5sGqMjmLciTP2STU1xx9eY4GiZ5o 1DmQ== X-Forwarded-Encrypted: i=1; AJvYcCU2LHU8NwVTN0MQ/+XaZ5Umnd5W+4I/Y8GHAy/vkpvQe28YueZc9EFfzJmcCiYsYTNRaPITQj51k1tixvE=@vger.kernel.org X-Gm-Message-State: AOJu0Yy9HF4HXLvUniYFrPLwGDB8KLNVc/o7VfAmlEDtkDpm+Q4C/fTy P6s/G5yJL2G4RrPOUx/AqMUB1CznY4YbVXK2vJGIQ2w7gbK1nmNBBzoUxTE4sP0VoH4hvX43n1f QdrSDRg== X-Received: from pgbcs14.prod.google.com ([2002:a05:6a02:418e:b0:c63:5306:c11d]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:a124:b0:366:5d1a:c737 with SMTP id adf61e73a8af0-3982deccfb6mr2418979637.16.1772638766352; Wed, 04 Mar 2026 07:39:26 -0800 (PST) Date: Wed, 4 Mar 2026 07:39:24 -0800 In-Reply-To: <20250806215133.43475-2-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250806215133.43475-1-jthoughton@google.com> <20250806215133.43475-2-jthoughton@google.com> Message-ID: Subject: Re: [PATCH 1/2] KVM: Add fault injection for some MMU operations From: Sean Christopherson To: James Houghton Cc: Paolo Bonzini , Akinobu Mita , David Matlack , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Wed, Aug 06, 2025, James Houghton wrote: > Provide fault injection hooks for three operations: > 1. For all architectures, retries due to invalidation notifiers. > 2. For x86, TDP MMU cmpxchg updates for SPTEs. > 3. For x86, TDP MMU SPTE iteration rescheduling. > > For all of these, fault injection can induce the uncommon cases: (1) > that an invalidation occurred, (2) a cmpxchg failed, and (3) that the > MMU lock is contended. ... > @@ -689,7 +691,8 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm, > * operates on fresh data, e.g. if it retries > * tdp_mmu_set_spte_atomic() > */ > - if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte)) > + if (tdp_mmu_cmpxchg_should_fail() || > + !try_cmpxchg64(sptep, &iter->old_spte, new_spte)) As discovered internally, this can cause the WARN_ON_ONCE() at the end of kvm_tdp_mmu_zap_possible_nx_huge_page() to fire, because the flow *guarantees* success. Thinking about this all a bit more, while I *really* like the idea of triggering uncommon paths in theory, I'm having strong reservations about enabling this in upstream, as I'm worried the signal:noise ratio could be abysmal. For many configurations and setups, mmu_notifier invalidations and MMU lock contention is actually quite common, i.e. in the aggregate, KVM actually gets good coverage of those paths. Giving userspace a way to deliberate induce retry for those cases doesn't seem like it will add much value, while at the same time it could lead to a rash of "bugs" due to e.g. syzkaller setting extreme retry percentages and manufacturing scenarios like stuck tasks that can't happen in practice. The CMPXCHG thing definitely has value, but as above even that is error prone to some degree. So if we want to take this forward, I think we should limit it to CMPXCHG, figure out a clean way for callers to prevent failure injection, and set a fairly high bar for extending failure injection to other areas. E.g. as was the case with the CMPXCHG injection, a real KVM bug that is extremely rare in practice, but relatively easy to trigger with artificial failure.