From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45602)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1b8soJ-0003O9-Bb
	for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:23:04 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1b8soD-0003nQ-D1
	for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:23:02 -0400
Received: from out2-smtp.messagingengine.com ([66.111.4.26]:38836)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1b8soA-0003jU-T9
	for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:22:57 -0400
Date: Fri, 3 Jun 2016 13:22:45 -0400
From: "Emilio G. Cota" <cota@braap.org>
Message-ID: <20160603172245.GA8361@flamenco>
References: <1464138802-23503-1-git-send-email-cota@braap.org>
	<1464138802-23503-9-git-send-email-cota@braap.org>
	<5749E02A.3080909@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <5749E02A.3080909@gmail.com>
Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent
 frequency distributions of data
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Sergey Fedorov <serge.fdrv@gmail.com>
Cc: QEMU Developers <qemu-devel@nongnu.org>, MTTCG Devel <mttcg@listserver.greensocs.com>, Alex =?iso-8859-1?Q?Benn=E9e?= <alex.bennee@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <rth@twiddle.net>

On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote:
> On 25/05/16 04:13, Emilio G. Cota wrote:
> (snip)
> > +double qdist_avg(const struct qdist *dist)
> > +{
> > +    unsigned long count;
> > +    size_t i;
> > +    double ret = 0;
> > +
> > +    count = qdist_sample_count(dist);
> > +    if (!count) {
> > +        return NAN;
> > +    }
> > +    for (i = 0; i < dist->n; i++) {
> > +        struct qdist_entry *e = &dist->entries[i];
> > +
> > +        ret += e->x * e->count / count;
> 
> Please use Welford’s method or something like that, see
> http://stackoverflow.com/a/1346890.

Yes, the way the mean is computed right now, we might suffer
from underflow if count is huge. But I'd rather take that, than the
perf penalty of an iterative method (such as the one used
in Welford's). Note that we might have huge amounts of
items, e.g. one item per head bucket in qht's occupancy qdist
(and 0.5M head buckets is easy to achieve).

If we were to use an iterative method, we'd need to do something
like:

double qdist_avg(const struct qdist *dist)
{
    size_t i, j;
    double ret = 0;

    if (!qdist_sample_count(dist)) {
        return NAN;
    }
    /* compute moving average to prevent under/overflow */
    for (i = 0; i < dist->n; i++) {
        struct qdist_entry *e = &dist->entries[i];

        for (j = 0; j < e->count; j++) {

            ret += (e->x - ret) / (i + j + 1);
        }
    }
    return ret;
}

Note that skipping the inner loop would be incorrect.

I measured the time it takes to execute qdist_avg(&hst.occupancy) at the
end of booting debian jessie for ARM. The difference is
significant:

Original:  0.000002 s
Iterative: 0.002846 s

So really I think we should be OK with a potential underflow. If you want
I can add a comment to remind our future selves of these findings.

		Emilio