From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43853) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAHeD-0001RP-Tl for qemu-devel@nongnu.org; Tue, 07 Jun 2016 10:06:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bAHe8-0002Tl-9U for qemu-devel@nongnu.org; Tue, 07 Jun 2016 10:06:24 -0400 Received: from mail-lf0-x243.google.com ([2a00:1450:4010:c07::243]:33951) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAHe7-0002Th-St for qemu-devel@nongnu.org; Tue, 07 Jun 2016 10:06:20 -0400 Received: by mail-lf0-x243.google.com with SMTP id k192so7797466lfb.1 for ; Tue, 07 Jun 2016 07:06:19 -0700 (PDT) References: <1464138802-23503-1-git-send-email-cota@braap.org> <1464138802-23503-9-git-send-email-cota@braap.org> <5749E02A.3080909@gmail.com> <20160603172245.GA8361@flamenco> <5751BE73.2090402@gmail.com> <5751C25F.6080801@gmail.com> <20160606234014.GA4418@flamenco> From: Sergey Fedorov Message-ID: <5756D4D8.7060202@gmail.com> Date: Tue, 7 Jun 2016 17:06:16 +0300 MIME-Version: 1.0 In-Reply-To: <20160606234014.GA4418@flamenco> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" Cc: QEMU Developers , MTTCG Devel , =?UTF-8?Q?Alex_Benn=c3=a9e?= , Paolo Bonzini , Richard Henderson On 07/06/16 02:40, Emilio G. Cota wrote: > On Fri, Jun 03, 2016 at 20:46:07 +0300, Sergey Fedorov wrote: >> On 03/06/16 20:29, Sergey Fedorov wrote: >>> On 03/06/16 20:22, Emilio G. Cota wrote: >>>> On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote: >>>>> On 25/05/16 04:13, Emilio G. Cota wrote: >>>>> (snip) >>>>>> +double qdist_avg(const struct qdist *dist) >>>>>> +{ >>>>>> + unsigned long count; >>>>>> + size_t i; >>>>>> + double ret = 0; >>>>>> + >>>>>> + count = qdist_sample_count(dist); >>>>>> + if (!count) { >>>>>> + return NAN; >>>>>> + } >>>>>> + for (i = 0; i < dist->n; i++) { >>>>>> + struct qdist_entry *e = &dist->entries[i]; >>>>>> + >>>>>> + ret += e->x * e->count / count; >>>>> Please use Welford’s method or something like that, see >>>>> http://stackoverflow.com/a/1346890. >>>> Yes, the way the mean is computed right now, we might suffer >>>> from underflow if count is huge. But I'd rather take that, than the >>>> perf penalty of an iterative method (such as the one used >>>> in Welford's). Note that we might have huge amounts of >>>> items, e.g. one item per head bucket in qht's occupancy qdist >>>> (and 0.5M head buckets is easy to achieve). >>>> >>>> If we were to use an iterative method, we'd need to do something >>>> like: >>>> >>>> double qdist_avg(const struct qdist *dist) >>>> { >>>> size_t i, j; >>>> double ret = 0; >>>> >>>> if (!qdist_sample_count(dist)) { >>>> return NAN; >>>> } >>>> /* compute moving average to prevent under/overflow */ >>>> for (i = 0; i < dist->n; i++) { >>>> struct qdist_entry *e = &dist->entries[i]; >>>> >>>> for (j = 0; j < e->count; j++) { >>>> >>>> ret += (e->x - ret) / (i + j + 1); >>>> } >>>> } >>>> return ret; >>>> } >>>> >>>> Note that skipping the inner loop would be incorrect. >>> Ah, it's a shame. I'm wondering if there is some other algorithm that >>> could work for us? >> Maybe something like >> https://en.wikipedia.org/wiki/Kahan_summation_algorithm could help? > That algorithm is overkill for what we're doing. Pairwise summation > should suffice: > > diff --git a/util/qdist.c b/util/qdist.c > index 3343640..909bd2b 100644 > --- a/util/qdist.c > +++ b/util/qdist.c > @@ -367,20 +367,34 @@ unsigned long qdist_sample_count(const struct qdist *dist) > return count; > } > > +static double qdist_pairwise_avg(const struct qdist *dist, size_t index, > + size_t n, unsigned long count) > +{ > + if (n <= 2) { We would like to amortize the overhead of the recursion by making the cut-off sufficiently large. > + size_t i; > + double ret = 0; > + > + for (i = 0; i < n; i++) { > + struct qdist_entry *e = &dist->entries[index + i]; > + > + ret += e->x * e->count / count; > + } > + return ret; > + } else { > + size_t n2 = n / 2; > + > + return qdist_pairwise_avg(dist, index, n2, count) + > + qdist_pairwise_avg(dist, index + n2, n - n2, count); > + } > +} > + > double qdist_avg(const struct qdist *dist) > { > unsigned long count; > - size_t i; > - double ret = 0; > > count = qdist_sample_count(dist); > if (!count) { > return NAN; > } > - for (i = 0; i < dist->n; i++) { > - struct qdist_entry *e = &dist->entries[i]; > - > - ret += e->x * e->count / count; > - } > - return ret; > + return qdist_pairwise_avg(dist, 0, dist->n, count); > } Otherwise looks good. Kind regards, Sergey