From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 00C7941C79 for ; Thu, 19 Dec 2024 02:13:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734574436; cv=none; b=VSHGm2P/ZXeu8QsqXjGi8ou3mvxbu3Bm15vZ4AFupp1qDVwOHpPmzCVyXGtCo2YziZOAgI6JJP975/9BKMnysC8e1lHlcXjyLFbSRThPWb1yyHstn0ugu+8T62Mrn5rotKxTFSnvQcfdO0yVE9h7NjVU3LjXcdJ9BxfmdJm0Ggk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734574436; c=relaxed/simple; bh=L7Fx5ggKmryfJVk2cggnyAeH+sHNuH9BZXayIbVtv7c=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=S6XwrhFZ0Lf0MESukpdR5cThPmHyqJztPdlqUlghTa1hBuew5zIxS9ujUhtAD2jnolVUpHVFKebUWdhneBSE5yUYc0JnnuhauoYbnN5J4Y4DJfvJrYNJEmKRcx6117rk38ZvmLPosrGZa76DCOQIRV6dUa5OvdCVFQSHDZtMw4o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=s3JQEvM2; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="s3JQEvM2" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ef6ef9ba3fso279940a91.2 for ; Wed, 18 Dec 2024 18:13:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734574434; x=1735179234; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=bDeAWYWC0HCAW56PbNHRhtgmzycjgvEnQXZSyGnnPYU=; b=s3JQEvM2S7o+PY5yrz1jlRdJdK5qJ8idLsRg1zsxWYYEKQ2H0lHTWKZE5er6JzY3Qw rx4DA+gZJ97w+c1KJWTta98UeBZEWdX2vEpugM8aFlCReWAiOMySlXKIon0pxYbb2b5G jQeeiMQsEnNOvRwAefCEQWOwEEMLD5PyNSwJAAh4yo0kE8p5QWfmEpozd6pN+C/gv5ck UkSE2zXfuoN6o1KU3ypfWAGt5Vcl+fHGJMplv/EsetqRH2AhNS6H+GWksOva4ic/Dw+G 51bqrx1Yb44aeuHmFYS5YjVP0/A7XeOrFDxeqWQKj1E/uByzs4oSADsAaq21eZySvpQv yxZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734574434; x=1735179234; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bDeAWYWC0HCAW56PbNHRhtgmzycjgvEnQXZSyGnnPYU=; b=c8krLg8elbhCR9Mram1+XvOfVvgFIV30xaMKAGldnQKB08jPL7imbKLQHpDmdzVmob d7m5WHWOg3j369ApfkYEoor22YJAtecnowH1qLGP9pbccaL/Aoa4UhIuHdTduBPGp9Ix Bic32o+Zb/gvCIDdrzrNGvedkBMWMmbpZ6U2vx1Q7coqj/0lUQPmujrI8oF7O0YRrlZF 1Yfoz0ahVU/tfa/SVzK91XnZWrEvfq7K25mpMRgaM+J/D+F+80TafB5TkcpKC07rH4Jt pyGxq1nbHgTjHWiTTlkikWrSJubDXmO48iHKPWTrwupwCbA5lQtEE3/owQL00QuN2b/d iLQQ== X-Forwarded-Encrypted: i=1; AJvYcCVk0kPwRg9Db1FVO5lf0MCasDki3uWFbgyA/4vLGxHJB1tAtoDD6qjVdQvPIrYi5z/MIScUIJHK6agq9I8=@vger.kernel.org X-Gm-Message-State: AOJu0YxS/fvwcvGhPKR2lsRqfkxAWkUL2Jk948SqDvZswE7yCpIV/x+P uVCInVcCFXvOwHxlE/INZqZoRjapL1TojkDcnTfLRH8z6N19uhnUdmlsW2MnzwNej623je6Ugdc yyQ== X-Google-Smtp-Source: AGHT+IGBCW3P8t1GXn8qPQXba7fdDIU+RBGGNlovYKQ/83YmsIwe1P+krSAzUfHxOYDQOX1yoarPPZ+uIu8= X-Received: from pjiz22.prod.google.com ([2002:a17:90a:6096:b0:2f4:4222:ebba]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2c8c:b0:2ee:e518:c1d8 with SMTP id 98e67ed59e1d1-2f443d3cc89mr2075399a91.30.1734574434264; Wed, 18 Dec 2024 18:13:54 -0800 (PST) Date: Wed, 18 Dec 2024 18:13:53 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241214010721.2356923-1-seanjc@google.com> <20241214010721.2356923-15-seanjc@google.com> Message-ID: Subject: Re: [PATCH 14/20] KVM: selftests: Collect *all* dirty entries in each dirty_log_test iteration From: Sean Christopherson To: Maxim Levitsky Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Peter Xu Content-Type: text/plain; charset="us-ascii" On Tue, Dec 17, 2024, Maxim Levitsky wrote: > On Fri, 2024-12-13 at 17:07 -0800, Sean Christopherson wrote: > > Collect all dirty entries during each iteration of dirty_log_test by > > doing a final collection after the vCPU has been stopped. To deal with > > KVM's destructive approach to getting the dirty bitmaps, use a second > > bitmap for the post-stop collection. > > > > Collecting all entries that were dirtied during an iteration simplifies > > the verification logic *and* improves test coverage. > > > > - If a page is written during iteration X, but not seen as dirty until > > X+1, the test can get a false pass if the page is also written during > > X+1. > > > > - If a dirty page used a stale value from a previous iteration, the test > > would grant a false pass. > > > > - If a missed dirty log occurs in the last iteration, the test would fail > > to detect the issue. > > > > E.g. modifying mark_page_dirty_in_slot() to dirty an unwritten gfn: > > > > if (memslot && kvm_slot_dirty_track_enabled(memslot)) { > > unsigned long rel_gfn = gfn - memslot->base_gfn; > > u32 slot = (memslot->as_id << 16) | memslot->id; > > > > if (!vcpu->extra_dirty && > > gfn_to_memslot(kvm, gfn + 1) == memslot) { > > vcpu->extra_dirty = true; > > mark_page_dirty_in_slot(kvm, memslot, gfn + 1); > > } > > if (kvm->dirty_ring_size && vcpu) > > kvm_dirty_ring_push(vcpu, slot, rel_gfn); > > else if (memslot->dirty_bitmap) > > set_bit_le(rel_gfn, memslot->dirty_bitmap); > > } > > > > isn't detected with the current approach, even with an interval of 1ms > > (when running nested in a VM; bare metal would be even *less* likely to > > detect the bug due to the vCPU being able to dirty more memory). Whereas > > collecting all dirty entries consistently detects failures with an > > interval of 700ms or more (the longer interval means a higher probability > > of an actual write to the prematurely-dirtied page). > > While this patch might improve coverage for this particular case, > I think that this patch will make the test to be much more deterministic, The verification will be more deterministic, but the actual testcase itself is just as random as it was before. > and thus have less chance of catching various races in the kernel that can happen. > > In fact in my option I prefer moving this test in other direction by > verifying dirty ring while the *vCPU runs* as well, in other words, not > stopping the vCPU at all unless its dirty ring is full. I don't see how letting verification be coincident with the vCPU running is at all interesting for a dirty logging. Host userspace reading guest memory while it's being written by the guest doesn't stress KVM's dirty logging in any meaningful way. E.g. it exercises hardware far more than anything else. If we want to stress that boundary, then we should spin up another vCPU or host thread to randomly read while the test is in-progress, and also to write to bytes 4095:8 (assuming a 4KiB page size), e.g. to ensure that dueling writes to a cacheline that trigger false sharing are handled correct. But letting the vCPU-under-test keep changing the memory while it's being validated would add significant complexity, without any benefit insofar as I can see. As evidenced by the bug the current approach can't detect, heavily stressing the system is meaningless if it's impossible to separate the signal from the noise.