From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S933727AbcBYTvY (ORCPT <rfc822;w@1wt.eu>);
	Thu, 25 Feb 2016 14:51:24 -0500
Received: from cmta11.telus.net ([209.171.16.84]:43806 "EHLO cmta11.telus.net"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S933462AbcBYTvW convert rfc822-to-8bit (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 25 Feb 2016 14:51:22 -0500
X-Authority-Analysis: v=2.1 cv=dOBb47tb c=1 sm=2 tr=0
 a=zJWegnE7BH9C0Gl4FFgQyA==:117 a=zJWegnE7BH9C0Gl4FFgQyA==:17
 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10
 a=Pyq9K9CWowscuQLKlpiwfMBGOR0=:19 a=IkcTkHD0fZMA:10 a=aatUQebYAAAA:8
 a=gzrIhozmDShBhz_MIsAA:9 a=ra1qn4QCUHd7veLg:21 a=X_xXr3ubE6eKtlqD:21
 a=QEXdDO2ut3YA:10
X-Telus-Outbound-IP: 173.180.45.4
From: "Doug Smythies" <dsmythies@telus.net>
To: "'Stephane Gasparini'" <stephane.gasparini@linux.intel.com>
Cc: "'Mel Gorman'" <mgorman@techsingularity.net>,
        "'Rafael Wysocki'" <rjw@rjwysocki.net>,
        "'Ingo Molnar'" <mingo@kernel.org>,
        "'Peter Zijlstra'" <peterz@infradead.org>,
        "'Matt Fleming'" <matt@codeblueprint.co.uk>,
        "'Mike Galbraith'" <umgwanakikbuti@gmail.com>,
        "'Linux-PM'" <linux-pm@vger.kernel.org>,
        "'LKML'" <linux-kernel@vger.kernel.org>,
        "'Srinivas Pandruvada'" <srinivas.pandruvada@linux.intel.com>
References: <1455793883-14214-1-git-send-email-mgorman@techsingularity.net> <E45553FF-63B3-4E5B-92A0-B5B00353F6F8@linux.intel.com> <001501d16b33$ff61d200$fe257600$@net> <9E20D36B-1323-41AA-969F-3D2DD5021701@linux.intel.com>
In-Reply-To: <9E20D36B-1323-41AA-969F-3D2DD5021701@linux.intel.com>
Subject: RE: [PATCH 1/1] intel_pstate: Increase hold-off time before busyness is scaled
Date: Thu, 25 Feb 2016 11:51:18 -0800
Message-ID: <002601d17005$e6369820$b2a3c860$@net>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: 8BIT
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AdFvJS0BhxdFLMhNRtCl26Qp4KrADQA2DZ+g
Content-Language: en-ca
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Steph,

On 2016.02.24 08:20 Stephane Gasparini wrote:
>> On Feb 19, 2016, at 5:38 PM, Doug Smythies <dsmythies@telus.net> wrote: 
>>> On 2016.02.19 03:12 Stephane Gasparini wrote:
>>> 
>>> The issue you are reporting looks like one we improved on android by using 
>>> the average pstate instead of using the last requested pstate
>>> 
>>> We know that this is improving the ffmpeg encoding performance when using the
>>> load algorithm.
>>> 
>>> see patch attached
>>> 
>>> This patch is only applied on get_target_pstate_use_cpu_load however you can give
>>> it a try on get_target_pstate_use_performance
>> 
>> Yes, that type of patch works on the load based approach.
>
> I’m not talking about using average p-state in the scaled_busy computation.
> I’m talking adding the output of the PID (the number of pstate to ad or subtract)
> to the average pstate rather than adding this to the current p-sate.

For the situation we are dealing with here, that would actually make it worse,
wouldn't it?

Let's work through a real very low load example from the Mel V2 patch where
the target pstate is increased whereas it should have been decreased:

Mel patch version 2 (12X hold off added to rjw 3 patch v10 set added to kernel 4.5-rc4):

CPU: 3
Core busy: 105
Scaled busy: 143
Old pstate: 25
New pstate: 34
mperf: 52039
aperf: 55097
tsc: 335265689
freq: 3599750 KHz
Load: 0.02%
Duration (mS): 98.293

New pstate = old pstate + (scaled_busy-setpoint) * p_gain
           = 25 + (143 - 97) * 0.2
           = 34 (as above)

Ave pstate = max_pstate * aperf / mperf
           = 34 * 55097 / 52039
           = 36

Steph average pstate method added to the above:
New pstate = ave pstate + (scaled_busy-setpoint) * p_gain
           = 36 + (143 - 97) * 0.2
           = 45 (before clamping)

Now, just for completeness show the no Mel patch math:
Scaled busy = Core busy * max_pstate / old pstate * sample time / duration
            = 105 * 34 / 25 * 10 / 98.293
            = 14.53
New pstate = old pstate + (scaled_busy-setpoint) * p_gain
            = 25 + (14.53 - 97) * .2
            = 8.5
            = 16 clamped minimum

Regardless, I coded the average pstate method and observe little
difference between it and the Mel V2 patch with limited testing.

... Doug