From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48695)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1b8suY-0000Ff-Cx
	for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:29:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1b8suU-00080g-DG
	for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:29:30 -0400
Received: from mail-lf0-x235.google.com ([2a00:1450:4010:c07::235]:34064)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1b8suU-00080a-5F
	for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:29:26 -0400
Received: by mail-lf0-x235.google.com with SMTP id k98so58952434lfi.1
	for <qemu-devel@nongnu.org>; Fri, 03 Jun 2016 10:29:26 -0700 (PDT)
References: <1464138802-23503-1-git-send-email-cota@braap.org>
	<1464138802-23503-9-git-send-email-cota@braap.org>
	<5749E02A.3080909@gmail.com> <20160603172245.GA8361@flamenco>
From: Sergey Fedorov <serge.fdrv@gmail.com>
Message-ID: <5751BE73.2090402@gmail.com>
Date: Fri, 3 Jun 2016 20:29:23 +0300
MIME-Version: 1.0
In-Reply-To: <20160603172245.GA8361@flamenco>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent
 frequency distributions of data
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Emilio G. Cota" <cota@braap.org>
Cc: QEMU Developers <qemu-devel@nongnu.org>, MTTCG Devel <mttcg@listserver.greensocs.com>, =?UTF-8?Q?Alex_Benn=c3=a9e?= <alex.bennee@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <rth@twiddle.net>

On 03/06/16 20:22, Emilio G. Cota wrote:
> On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote:
>> On 25/05/16 04:13, Emilio G. Cota wrote:
>> (snip)
>>> +double qdist_avg(const struct qdist *dist)
>>> +{
>>> +    unsigned long count;
>>> +    size_t i;
>>> +    double ret = 0;
>>> +
>>> +    count = qdist_sample_count(dist);
>>> +    if (!count) {
>>> +        return NAN;
>>> +    }
>>> +    for (i = 0; i < dist->n; i++) {
>>> +        struct qdist_entry *e = &dist->entries[i];
>>> +
>>> +        ret += e->x * e->count / count;
>> Please use Welford’s method or something like that, see
>> http://stackoverflow.com/a/1346890.
> Yes, the way the mean is computed right now, we might suffer
> from underflow if count is huge. But I'd rather take that, than the
> perf penalty of an iterative method (such as the one used
> in Welford's). Note that we might have huge amounts of
> items, e.g. one item per head bucket in qht's occupancy qdist
> (and 0.5M head buckets is easy to achieve).
>
> If we were to use an iterative method, we'd need to do something
> like:
>
> double qdist_avg(const struct qdist *dist)
> {
>     size_t i, j;
>     double ret = 0;
>
>     if (!qdist_sample_count(dist)) {
>         return NAN;
>     }
>     /* compute moving average to prevent under/overflow */
>     for (i = 0; i < dist->n; i++) {
>         struct qdist_entry *e = &dist->entries[i];
>
>         for (j = 0; j < e->count; j++) {
>
>             ret += (e->x - ret) / (i + j + 1);
>         }
>     }
>     return ret;
> }
>
> Note that skipping the inner loop would be incorrect.

Ah, it's a shame. I'm wondering if there is some other algorithm that
could work for us?

> I measured the time it takes to execute qdist_avg(&hst.occupancy) at the
> end of booting debian jessie for ARM. The difference is
> significant:
>
> Original:  0.000002 s
> Iterative: 0.002846 s

Have you compared the results of computing the average as well?

>
> So really I think we should be OK with a potential underflow. If you want
> I can add a comment to remind our future selves of these findings.

Kind regards,
Sergey