From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751355AbdJBRKT (ORCPT ); Mon, 2 Oct 2017 13:10:19 -0400 Received: from mga14.intel.com ([192.55.52.115]:21954 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751065AbdJBRKR (ORCPT ); Mon, 2 Oct 2017 13:10:17 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,470,1500966000"; d="scan'208";a="141819047" Date: Mon, 2 Oct 2017 10:12:49 -0700 From: Jacob Pan To: "Rafael J. Wysocki" Cc: "Michael S. Tsirkin" , Yang Zhang , Linux Kernel Mailing List , kvm@vger.kernel.org, Wanpeng Li , Paolo Bonzini , Thomas Gleixner , rkrcmar@redhat.com, dmatlack@google.com, agraf@suse.de, Peter Zijlstra , Len Brown , Linux PM , jacob.jun.pan@linux.intel.com Subject: Re: [PATCH RFC hack dont apply] intel_idle: support running within a VM Message-ID: <20171002101249.69b5611a@jacob-builder> In-Reply-To: References: <20170930005046-mutt-send-email-mst@kernel.org> Organization: OTC X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 30 Sep 2017 01:21:43 +0200 "Rafael J. Wysocki" wrote: > On Sat, Sep 30, 2017 at 12:01 AM, Michael S. Tsirkin > wrote: > > intel idle driver does not DTRT when running within a VM: > > when going into a deep power state, the right thing to > > do is to exit to hypervisor rather than to keep polling > > within guest using mwait. > > > > Currently the solution is just to exit to hypervisor each time we go > > idle - this is why kvm does not expose the mwait leaf to guests even > > when it allows guests to do mwait. > > > > But that's not ideal - it seems better to use the idle driver to > > guess when will the next interrupt arrive. > > The idle driver alone is not sufficient for that, though. > I second that. Why try to solve this problem at vendor specific driver level? perhaps just a pv idle driver that decide whether to vmexit based on something like local per vCPU timer expiration? I guess we can't predict other wake events such as interrupts. e.g. if (get_next_timer_interrupt() > kvm_halt_target_residency) vmexit else poll Jacob