From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752547AbbIPKTy (ORCPT ); Wed, 16 Sep 2015 06:19:54 -0400 Received: from e06smtp17.uk.ibm.com ([195.75.94.113]:40302 "EHLO e06smtp17.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754535AbbIPKMs (ORCPT ); Wed, 16 Sep 2015 06:12:48 -0400 X-Helo: d06dlp01.portsmouth.uk.ibm.com X-MailFrom: borntraeger@de.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Subject: Re: [PATCH] KVM: add halt_attempted_poll to VCPU stats To: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org References: <1442334477-35377-1-git-send-email-pbonzini@redhat.com> Cc: rkrcmar@redhat.com, David Hildenbrand , David Matlack , Jens Freimann From: Christian Borntraeger Message-ID: <55F9409B.9020501@de.ibm.com> Date: Wed, 16 Sep 2015 12:12:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <1442334477-35377-1-git-send-email-pbonzini@redhat.com> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15091610-0029-0000-0000-0000066BA7F1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 15.09.2015 um 18:27 schrieb Paolo Bonzini: > This new statistic can help diagnosing VCPUs that, for any reason, > trigger bad behavior of halt_poll_ns autotuning. > > For example, say halt_poll_ns = 480000, and wakeups are spaced exactly > like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes > 10+20+40+80+160+320+480 = 1110 microseconds out of every > 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then > is consuming about 30% more CPU than it would use without > polling. This would show as an abnormally high number of > attempted polling compared to the successful polls. > > Cc: Christian Borntraeger Cc: David Matlack > Signed-off-by: Paolo Bonzini Acked-by: Christian Borntraeger yes, this will help to detect some bad cases, but not all. PS: upstream maintenance keeps me really busy at the moment :-) I am looking into a case right now, where auto polling goes completely nuts on my system: guest1: 8vcpus guest2: 1 vcpu iperf with 25 process (-P25) from guest1 to guest2. I/O interrupts on s390 are floating (pending on all CPUs) so on ALL VCPUs that go to sleep, polling will consider any pending network interrupt as successful poll. So with auto polling the guest consumes up to 5 host CPUs without auto polling only 1. Reducing halt_poll_ns to 100000 seems to work (goes back to 1 cpu). The proper way might be to feedback the result of the interrupt dequeue into the heuristics. Don't know yet how to handle that properly. Christian