From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78B1E1A5AF for ; Mon, 30 Oct 2023 21:52:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KP0aBuSR" Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28BC0ED for ; Mon, 30 Oct 2023 14:52:09 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1cc56a9ece7so9844625ad.3 for ; Mon, 30 Oct 2023 14:52:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698702728; x=1699307528; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=BzJ/Ymhb65Ar0LBBebue0mT9iTpkJ0/uT6hATBgr33I=; b=KP0aBuSRG+skxT/wl+FSpA5EVV+giIddQX8FRcsP1y5rNL/R2lJmj+HOOUPunWV6WZ TfFiIMZUdjs5h8RvyzdZCjT0ay1CgH/W/a3qxKgFRY0lRxHuBSBpDkM1o4Vx1EkZLfYe 7ot7sGHINw+mXQ/o8Ml86bLas2yjzd0u/IzK2arIiDUn2n1SAwmj10hk39RPZFGSI+/q G7Iay+I1nwjufxTMiIGpxMWfokBkRP9oR4q21o9gZDdFQg5VsO50vqbDt1RDTB9vvRBL vxNwACn7pPb/dSX4vCsiTB5SZaTUJwRrO7h+YKjYNo/2bSa31MGKRFp7trSbR7ZhSZdo 6qDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698702728; x=1699307528; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=BzJ/Ymhb65Ar0LBBebue0mT9iTpkJ0/uT6hATBgr33I=; b=gODeCF9y4rvD4WpiMAMHYiO1pvfjp18c+bA+FeHziahJtA86UJADzNU7zNSWmEK8KQ Aso9nx1+fDqH3fAly9I3tBWIu8RKJSpAgAzPR4XHgzMu2QmrAQ5tHl57+0X7gbmHPrZR QpuoPD6GH05Ij29YPNpiMuBusdRDylB5N747DFM1e4kCzIM0QZWuRL8bk7E17ZTBU/2j aKt+95UhDQlMc0Isq2sAjjHOuaEiWo5/6P47KRtQkJLXSjbp9D8fk/EypwL6wk3pEYNx rg5fmsmoyP3ofwkVA4GsCx2OlSK84dqFyu7G4m8zuWtgouIDbGENZKSBQ52XPO0SL7UO OuVw== X-Gm-Message-State: AOJu0YyWXrs9pCPcZ5fBDIVNthGY99n3YgXBZNLAnDcuC/yaOPOr5cuB OzFs2G+HtQMtc0F9ymcablhaO9teCNs= X-Google-Smtp-Source: AGHT+IH2J9EhY1pZytvMjUtYvpcL+sIpMNtzT9vyOr4cvb4mXzpvu/rjLEjNyHYHjdQ8EJndP2Z1wMEcKoE= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:ab84:b0:1c5:7226:d5d6 with SMTP id f4-20020a170902ab8400b001c57226d5d6mr176972plr.6.1698702728708; Mon, 30 Oct 2023 14:52:08 -0700 (PDT) Date: Mon, 30 Oct 2023 21:52:07 +0000 In-Reply-To: <3E43ADC6-E817-411A-9EBF-B16142B9B478@cs.utexas.edu> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <2C868574-37F4-437D-8355-46A6D1615E51@cs.utexas.edu> <3E43ADC6-E817-411A-9EBF-B16142B9B478@cs.utexas.edu> Message-ID: Subject: Re: A question about how the KVM emulates the effect of guest MTRRs on AMD platforms From: Sean Christopherson To: Yibo Huang Cc: Yan Zhao , kvm@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Mon, Oct 30, 2023, Yibo Huang wrote: > Well, I agree with Sean=E2=80=99s opinion that SVM+NPT obviates the need = to emulate > guest MTRRs for real-world guest workloads. However, from my own experie= nce, > I think KVM does emulate the effect of guest MTRRs on AMD platforms. >=20 > Here's the reason: > 2 months ago, I was trying to attach a QEMU ivshmem device to my VMs runn= ing > on Intel and AMD machines. Since ivshmem is an emulated memory-backed > device, it should be cacheable to get the best performance. > Interestingly, I found that the memory region associated with ivshmem (PC= Ie > BAR 2 region) was cacheable on Intel machines, but not cacheable on AMD > machines. > After some digging, I found that this was because of the guest MTRRs - on= AMD > machines, BIOS or guest OS (not sure who did this) set the memory region = of > ivshmem as non-cacheable in guest MTRRs (but cacheable in guest PAT). Thi= s > was supported by the fact that ivhsmem became cacheable after removing th= e > corresponding guest MTRRs (reg02) on AMD machines (using "echo -n disable= =3D2 > > /proc/mtrr=E2=80=9D) > Additionally, the reason why ivshmem was cacheable on Intel machines was = that > BIOS or guest OS didn=E2=80=99t set ivshmem as uncacheable in guest MTRRs= on Intel > machines (not sure why though). What test(s) did you run to determine whether or not the memory was truly c= acheable? KVM emulates the MTRR MSRs themselves, e.g. the guest can read and write MT= RRs, and the guest will _think_ memory has a certain memtype, but that doesn't n= ecessarily have any impact on the memtype used by the CPU. > Below is the output of =E2=80=9Ccat /proc/mtrr=E2=80=9D on my VMs running= on AMD machines. By > removing reg02, ivshmem BAR 2 region became cacheable. >=20 >=20 > So in my opinion, the above phenomenon suggests that KVM does honor guest > MTRRs on AMD platforms. Heh, this isn't opinion. Unless you're running a very specific 10-year old= kernel, or a custom KVM build, KVM simply out doesn't propagate guest MTRRs into NP= T. And unless your setup also has non-coherent DMA attached to the device, KVM= doesn't honor guest MTRRs for EPT either (AFAICT, QEMU ivshmem doesn't require VFIO= ). It's definitely possible that disabling a guest MTRR resulted in memory bec= oming cacheable, but unless there's some very, very magical code hiding, it's not= because KVM actually fully virtualizes guest MTRRs on AMD. E.g. before commit 9a3768191d95 ("KVM: x86/mmu: Zap SPTEs on MTRR update if= f guest MTRRs are honored"), which hasn't even made its way to Linus (or Paolo's) t= ree yet, KVM unnecessarily zapped all NPT entries on MTRR changes. Zapping NPT entr= ies could have cleared some weird TLB state, or perhaps even wiped out buggy KV= M NPT entries. And on AMD, hardware virtualizes gCR0.CD, i.e. puts the caches into no-fill= mode when guest CR0.CD=3D1. But Intel CPUs completely ignore guest CR0.CD, i.e.= punt it to software, and under QEMU, for all intents and purposes KVM never honors = guest CR0.CD for VMX. It's seems highly quite unlikely that something in the gue= st left CR0.CD=3D1, but it's possible. And then the guest kernel's process of togg= ling CR0.CD when doing MTRR updates would end up clearing CR0.CD and thus re-ena= ble caching. > The thing was that I could not find any KVM code related to emulating gue= st > MTRRs on AMD platforms, which was the reason why I decided to send the > initial email asking about it. >=20 > I found this in the AMD64 Architecture Programmer=E2=80=99s Manual Volume= s 1=E2=80=935 (page > 553):=20 >=20 > "Table 15-19 shows how guest and host PAT types are combined into an > effective PAT type. When interpreting this table, recall (a) that guest a= nd > host PAT types are not combined when nested paging is disabled and (b) th= at > the intent is for the VMM to use its PAT type to simulate guest MTRRs.=E2= =80=9D >=20 > Does this mean that AMD expects the VMM to emulate the effect of guest MT= RRs > by altering the host PAT types? Yes. Which is exactly what KVM did in commit 3c2e7f7de324 ("KVM: SVM: use = NPT page attributes"), which was a reverted a few months after it was introduce= d. > I am not sure if I misunderstood something. But I can reproduce the examp= le I > mentioned above if you would like to look into it. Yes, it would be helpful to confirm what's going on. =20