From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH 4/4] tools: add total/local memory bandwith
 monitoring
Date: Tue, 6 Jan 2015 10:29:18 +0000
Message-ID: <54ABB8FE.3010801@citrix.com>
References: <1419324880-13212-1-git-send-email-chao.p.peng@linux.intel.com>	<1419324880-13212-5-git-send-email-chao.p.peng@linux.intel.com>	<20150105123942.GE24360@zion.uk.xensource.com>
	<20150106100949.GC3279@pengc-linux.bj.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <20150106100949.GC3279@pengc-linux.bj.intel.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Chao Peng <chao.p.peng@linux.intel.com>, Wei Liu <wei.liu2@citrix.com>
Cc: keir@xen.org, Ian.Campbell@citrix.com, stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com, xen-devel@lists.xen.org, JBeulich@suse.com
List-Id: xen-devel@lists.xenproject.org

On 06/01/15 10:09, Chao Peng wrote:
> On Mon, Jan 05, 2015 at 12:39:42PM +0000, Wei Liu wrote:
>> On Tue, Dec 23, 2014 at 04:54:39PM +0800, Chao Peng wrote:
>> [...]
>>> +static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc, uint32_t domid,
>>> +    xc_psr_cmt_type type, uint32_t socketid, uint32_t *bandwidth)
>>> +{
>>> +    uint64_t sample1, sample2;
>>> +    uint32_t upscaling_factor;
>>> +    int rc;
>>> +
>>> +    rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid,
>>> +                    type, socketid, &sample1);
>>> +    if (rc < 0)
>>> +        return ERROR_FAIL;
>>> +
>>> +    usleep(10000);
>>> +
>>> +    rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid,
>>> +                    type, socketid, &sample2);
>>> +    if (rc < 0)
>>> +       return ERROR_FAIL;
>>> +
>>> +    if (sample2 < sample1) {
>>> +         LOGE(ERROR, "event counter overflowed between two samplings");
>>> +         return ERROR_FAIL;
>>> +    }
>>> +
>> What's the likelihood of counter overflows? Can we handle this more
>> gracefully? Say, retry (with maximum retry cap) when counter overflows?
> The likelihood is very small here. Hardware guarantees the counter will
> not overflow in one second even under maximum platform bandwidth conditions.
> And we only sleep 0.01 second here. 
>
> I'd like to adopt your suggestion to retry another time once that happens.
> But only one retry and it should correct the overflow.
>
> Thanks,
> Chao

You have no possible way of guaranteeing that the actual elapsed time
between the two samples is less than 1 second.  On a very heavily loaded
system, even regular task scheduling could cause an actual elapsed time
of more than one second in that snippet of code.

~Andrew