From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH 4/4] tools: add total/local memory bandwith monitoring Date: Tue, 6 Jan 2015 10:29:18 +0000 Message-ID: <54ABB8FE.3010801@citrix.com> References: <1419324880-13212-1-git-send-email-chao.p.peng@linux.intel.com> <1419324880-13212-5-git-send-email-chao.p.peng@linux.intel.com> <20150105123942.GE24360@zion.uk.xensource.com> <20150106100949.GC3279@pengc-linux.bj.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150106100949.GC3279@pengc-linux.bj.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Chao Peng , Wei Liu Cc: keir@xen.org, Ian.Campbell@citrix.com, stefano.stabellini@eu.citrix.com, Ian.Jackson@eu.citrix.com, xen-devel@lists.xen.org, JBeulich@suse.com List-Id: xen-devel@lists.xenproject.org On 06/01/15 10:09, Chao Peng wrote: > On Mon, Jan 05, 2015 at 12:39:42PM +0000, Wei Liu wrote: >> On Tue, Dec 23, 2014 at 04:54:39PM +0800, Chao Peng wrote: >> [...] >>> +static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc, uint32_t domid, >>> + xc_psr_cmt_type type, uint32_t socketid, uint32_t *bandwidth) >>> +{ >>> + uint64_t sample1, sample2; >>> + uint32_t upscaling_factor; >>> + int rc; >>> + >>> + rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, >>> + type, socketid, &sample1); >>> + if (rc < 0) >>> + return ERROR_FAIL; >>> + >>> + usleep(10000); >>> + >>> + rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, >>> + type, socketid, &sample2); >>> + if (rc < 0) >>> + return ERROR_FAIL; >>> + >>> + if (sample2 < sample1) { >>> + LOGE(ERROR, "event counter overflowed between two samplings"); >>> + return ERROR_FAIL; >>> + } >>> + >> What's the likelihood of counter overflows? Can we handle this more >> gracefully? Say, retry (with maximum retry cap) when counter overflows? > The likelihood is very small here. Hardware guarantees the counter will > not overflow in one second even under maximum platform bandwidth conditions. > And we only sleep 0.01 second here. > > I'd like to adopt your suggestion to retry another time once that happens. > But only one retry and it should correct the overflow. > > Thanks, > Chao You have no possible way of guaranteeing that the actual elapsed time between the two samples is less than 1 second. On a very heavily loaded system, even regular task scheduling could cause an actual elapsed time of more than one second in that snippet of code. ~Andrew