From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 715F0C04E53 for ; Wed, 15 May 2019 14:30:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4BA1120862 for ; Wed, 15 May 2019 14:30:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727329AbfEOOaR (ORCPT ); Wed, 15 May 2019 10:30:17 -0400 Received: from mga18.intel.com ([134.134.136.126]:6774 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726911AbfEOOaR (ORCPT ); Wed, 15 May 2019 10:30:17 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 May 2019 07:30:16 -0700 X-ExtLoop1: 1 Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.36]) by fmsmga007.fm.intel.com with ESMTP; 15 May 2019 07:30:15 -0700 Date: Wed, 15 May 2019 07:30:15 -0700 From: Sean Christopherson To: Wanpeng Li Cc: Joao Martins , Liran Alon , Paolo Bonzini , Radim Krcmar , kvm , Boris Ostrovsky , Ankur Arora Subject: Re: [PATCH] KVM: VMX: Nop emulation of MSR_IA32_POWER_CTL Message-ID: <20190515143015.GB5875@linux.intel.com> References: <20190415154526.64709-1-liran.alon@oracle.com> <20190415181702.GH24010@linux.intel.com> <4848D424-F852-4E1C-8A86-6AA1A26D2E90@oracle.com> <2dad36e7-a0e5-9670-c902-819c5200466f@oracle.com> <20190510171733.GA16852@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Mon, May 13, 2019 at 05:13:29PM +0800, Wanpeng Li wrote: > On Sat, 11 May 2019 at 01:17, Sean Christopherson > wrote: > > > > On Fri, May 10, 2019 at 11:34:41AM +0100, Joao Martins wrote: > > > On 5/10/19 10:54 AM, Wanpeng Li wrote: > > > > It is weird that we can observe intel_idle driver in the guest > > > > executes mwait eax=0x20, and the corresponding pCPU enters C3 on HSW > > > > server, however, we can't observe this on SKX/CLX server, it just > > > > enters maximal C1. > > > > > > I assume you refer to the case where you pass the host mwait substates to the > > > guests as is, right? Or are you zeroing/filtering out the mwait cpuid leaf EDX > > > like my patch (attached in the previous message) suggests? > > > > > > Interestingly, hints set to 0x20 actually corresponds to C6 on HSW (based on > > > intel_idle driver). IIUC From the SDM (see Vol 2B, "MWAIT for Power Management" > > > in instruction set reference M-U) the hints register, doesn't necessarily > > > guarantee the specified C-state depicted in the hints will be used. The manual > > > makes it sound like it is tentative, and implementation-specific condition may > > > either ignore it or enter a different one. It appears to be only guaranteed that > > > it won't enter a C-{sub,}state deeper than the one depicted. > > > > Yep, section "MWAIT EXTENSIONS FOR ADVANCED POWER MANAGEMENT" is more > > explicit on this point: > > > > At CPL=0, system software can specify desired C-state and sub C-state by > > using the MWAIT hints register (EAX). Processors will not go to C-state > > and sub C-state deeper than what is specified by the hint register. > > > > As for why SKX/CLX only enters C1, AFAICT SKX isn't configured to support > > C3, e.g. skx_cstates in drivers/idle/intel_idle.c shows C1, C1E and C6. > > A quick search brings up a variety of docs that confirm this. My guess is > > that C1E provides better power/performance than C3 for the majority of > > server workloads, e.g. C3 doesn't provide enough power savings to justify > > its higher latency and TLB flush. > > You are right, I figure this out by referring to the SKX/CLX EDS, the > Core C-States of these two generations just support CC0/CC1/CC1E/CC6. > The issue here is after exposing mwait to the guest, SKX/CLX guest > can't enter CC6, however, HSW guest can enter CC3/CC6. Both HSW and > SKX/CLX hosts can enter CC6. We observe SKX/CLX guests execute mwait > eax 0x20, however, we can't observe the corresponding pCPU enter CC6 > by turbostat or reading MSR_CORE_C6_RESIDENCY directly. It's likely that the CPU is operating as expected and isn't dropping into the deeper sleep state because of some heuristic or wake event. It might be something as simple as the combination of periodic tick interrupts between host and guest occuring too frequently (to get to C6), or it could be a much more complex scenario. It's been several years since I've done anything close to hands on debug with C-states, I have no idea what capabilities are available to help debug this sort of thing. Sorry I can't be more helpful.