qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Emilio G. Cota" <cota@braap.org>
To: Sergey Fedorov <serge.fdrv@gmail.com>
Cc: "QEMU Developers" <qemu-devel@nongnu.org>,
	"MTTCG Devel" <mttcg@listserver.greensocs.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data
Date: Mon, 6 Jun 2016 21:05:45 -0400	[thread overview]
Message-ID: <20160607010545.GB4418@flamenco> (raw)
In-Reply-To: <5749E02A.3080909@gmail.com>

On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote:
> On 25/05/16 04:13, Emilio G. Cota wrote:
> > diff --git a/util/qdist.c b/util/qdist.c
> > new file mode 100644
> > index 0000000..3343640
> > --- /dev/null
> > +++ b/util/qdist.c
> > @@ -0,0 +1,386 @@
> (snip)
> > +
> > +void qdist_add(struct qdist *dist, double x, long count)
> > +{
> > +    struct qdist_entry *entry = NULL;
> > +
> > +    if (dist->entries) {
> > +        struct qdist_entry e;
> > +
> > +        e.x = x;
> > +        entry = bsearch(&e, dist->entries, dist->n, sizeof(e), qdist_cmp);
> > +    }
> > +
> > +    if (entry) {
> > +        entry->count += count;
> > +        return;
> > +    }
> > +
> > +    dist->entries = g_realloc(dist->entries,
> > +                              sizeof(*dist->entries) * (dist->n + 1));
> 
> Repeated doubling?

Can you please elaborate?

> > +    dist->n++;
> > +    entry = &dist->entries[dist->n - 1];
> 
> What if we combine the above two lines:
> 
>     entry = &dist->entries[dist->n++];
> 
> or just reverse them:
> 
>     entry = &dist->entries[dist->n];
>     dist->n++;

I have less trouble understanding the original.

> > +    entry->x = x;
> > +    entry->count = count;
> > +    qsort(dist->entries, dist->n, sizeof(*entry), qdist_cmp);
> > +}
> > +
> (snip)
> > +static char *qdist_pr_internal(const struct qdist *dist)
> > +{
> > +    double min, max, step;
> > +    GString *s = g_string_new("");
> > +    size_t i;
> > +
> > +    /* if only one entry, its printout will be either full or empty */
> > +    if (dist->n == 1) {
> > +        if (dist->entries[0].count) {
> > +            g_string_append_unichar(s, qdist_blocks[QDIST_NR_BLOCK_CODES - 1]);
> > +        } else {
> > +            g_string_append_c(s, ' ');
> > +        }
> > +        goto out;
> > +    }
> > +
> > +    /* get min and max counts */
> > +    min = dist->entries[0].count;
> > +    max = min;
> > +    for (i = 0; i < dist->n; i++) {
> > +        struct qdist_entry *e = &dist->entries[i];
> > +
> > +        if (e->count < min) {
> > +            min = e->count;
> > +        }
> > +        if (e->count > max) {
> > +            max = e->count;
> > +        }
> > +    }
> > +
> > +    /* floor((count - min) * step) will give us the block index */
> > +    step = (QDIST_NR_BLOCK_CODES - 1) / (max - min);
> > +
> > +    for (i = 0; i < dist->n; i++) {
> > +        struct qdist_entry *e = &dist->entries[i];
> > +        int index;
> > +
> > +        /* make an exception with 0; instead of using block[0], print a space */
> > +        if (e->count) {
> > +            index = (int)((e->count - min) * step);
> 
> So "e->count == min" gives us one eighth block instead of just space?

Yes, only 0 can print a space.

> > +            g_string_append_unichar(s, qdist_blocks[index]);
> > +        } else {
> > +            g_string_append_c(s, ' ');
> > +        }
> > +    }
> > + out:
> > +    return g_string_free(s, FALSE);
> > +}
> > +
> > +/*
> > + * Bin the distribution in @from into @n bins of consecutive, non-overlapping
> > + * intervals, copying the result to @to.
> > + *
> > + * This function is internal to qdist: only this file and test code should
> > + * ever call it.
> > + *
> > + * Note: calling this function on an already-binned qdist is a bug.
> > + *
> > + * If @n == 0 or @from->n == 1, use @from->n.
> > + */
> > +void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n)
> > +{
> > +    double xmin, xmax;
> > +    double step;
> > +    size_t i, j, j_min;
> > +
> > +    qdist_init(to);
> > +
> > +    if (!from->entries) {
> > +        return;
> > +    }
> > +    if (!n || from->n == 1) {
> > +        n = from->n;
> > +    }
> > +
> > +    /* set equally-sized bins between @from's left and right */
> > +    xmin = qdist_xmin(from);
> > +    xmax = qdist_xmax(from);
> > +    step = (xmax - xmin) / n;
> > +
> > +    if (n == from->n) {
> > +        /* if @from's entries are equally spaced, no need to re-bin */
> > +        for (i = 0; i < from->n; i++) {
> > +            if (from->entries[i].x != xmin + i * step) {
> > +                goto rebin;
> 
> static inline function instead of goto?

It would have quite a few arguments, I think the goto is fine.

> > +            }
> > +        }
> > +        /* they're equally spaced, so copy the dist and bail out */
> > +        to->entries = g_malloc(sizeof(*to->entries) * from->n);
> 
> g_new()?

Changed.

> > +        to->n = from->n;
> > +        memcpy(to->entries, from->entries, sizeof(*to->entries) * to->n);
> > +        return;
> > +    }
> > +
> > + rebin:
> > +    j_min = 0;
> > +    for (i = 0; i < n; i++) {
> > +        double x;
> > +        double left, right;
> > +
> > +        left = xmin + i * step;
> > +        right = xmin + (i + 1) * step;
> > +
> > +        /* Add x, even if it might not get any counts later */
> > +        x = left;
> 
> This way we round down to the left margin of each bin like this:
> 
>     xmin [*---*---*---*---*] xmax   -- from
>           |  /|  /|  /|  /
>           | / | / | / | /
>           |/  |/  |/  |/
>           |   |   |   |
>           V   V   V   V
>          [*   *   *   *]            -- to
(snip)
>     xmin [*----*----*----*] xmax    -- from
>         \   /\   /\   /\   /
>          \ /  \ /  \ /  \ /
>           |    |    |    |
>           V    V    V    V
>          [*    *    *    *]         -- to
> 
> I'm not sure which is the more correct option from the mathematical
> point of view; but multiple-binning with the last variant of the
> algorithm we would still give the same result.

There's no "right" or "wrong" way as long as we're consistent
and we print the right counts in the right bins. I think the
convention I chose is simple enough, and leads to simple printing
of the labels. But yes other alternatives would be OK here.

> > +        qdist_add(to, x, 0);
> > +
> > +        /*
> > +         * To avoid double-counting we capture [left, right) ranges, except for
> > +         * the righmost bin, which captures a [left, right] range.
> > +         */
> > +        for (j = j_min; j < from->n; j++) {
> 
> Looks like we don't need to keep both 'j' and 'j_min'. We could just use
> 'j', initialize it before the outer loop, and do the inner loop with
> "while".

I prefer for over while if the for loop looks idiomatic.
wrt j_min, I'd let the compiler deal with it.

Thanks,

		Emilio

  parent reply	other threads:[~2016-06-07  1:06 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-25  1:13 [Qemu-devel] [PATCH v6 00/15] tb hash improvements Emilio G. Cota
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 01/15] compiler.h: add QEMU_ALIGNED() to enforce struct alignment Emilio G. Cota
2016-05-27 19:54   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 02/15] seqlock: remove optional mutex Emilio G. Cota
2016-05-27 19:55   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 03/15] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-05-27 19:59   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 04/15] include/processor.h: define cpu_relax() Emilio G. Cota
2016-05-27 20:53   ` Sergey Fedorov
2016-05-27 21:10     ` Emilio G. Cota
2016-05-28 12:35       ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 05/15] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 06/15] exec: add tb_hash_func5, derived from xxhash Emilio G. Cota
2016-05-28 12:36   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 07/15] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-05-28 12:39   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data Emilio G. Cota
2016-05-28 18:15   ` Sergey Fedorov
2016-06-03 17:22     ` Emilio G. Cota
2016-06-03 17:29       ` Sergey Fedorov
2016-06-03 17:46         ` Sergey Fedorov
2016-06-06 23:40           ` Emilio G. Cota
2016-06-07 14:06             ` Sergey Fedorov
2016-06-07 22:53               ` Emilio G. Cota
2016-06-08 13:09                 ` Sergey Fedorov
2016-06-07  1:05     ` Emilio G. Cota [this message]
2016-06-07 15:56       ` Sergey Fedorov
2016-06-08  0:02         ` Emilio G. Cota
2016-06-08 14:10           ` Sergey Fedorov
2016-06-08 18:06             ` Emilio G. Cota
2016-06-08 18:18               ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 09/15] qdist: add test program Emilio G. Cota
2016-05-28 18:56   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 10/15] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-05-29 19:52   ` Sergey Fedorov
2016-05-29 19:55     ` Sergey Fedorov
2016-05-31  7:46     ` Alex Bennée
2016-06-01 20:53       ` Sergey Fedorov
2016-06-03  9:18     ` Emilio G. Cota
2016-06-03 15:19       ` Sergey Fedorov
2016-06-03 11:01     ` Emilio G. Cota
2016-06-03 15:34       ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 11/15] qht: add test program Emilio G. Cota
2016-05-29 20:15   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 12/15] qht: add qht-bench, a performance benchmark Emilio G. Cota
2016-05-29 20:45   ` Sergey Fedorov
2016-06-03 11:41     ` Emilio G. Cota
2016-06-03 15:41       ` Sergey Fedorov
2016-05-31 15:12   ` Alex Bennée
2016-05-31 16:44     ` Emilio G. Cota
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 13/15] qht: add test-qht-par to invoke qht-bench from 'check' target Emilio G. Cota
2016-05-29 20:53   ` Sergey Fedorov
2016-06-03 11:07     ` Emilio G. Cota
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 14/15] tb hash: track translated blocks with qht Emilio G. Cota
2016-05-29 21:09   ` Sergey Fedorov
2016-05-31  8:39   ` Alex Bennée
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 15/15] translate-all: add tb hash bucket info to 'info jit' dump Emilio G. Cota
2016-05-29 21:14   ` Sergey Fedorov
2016-06-08  6:25 ` [Qemu-devel] [PATCH v6 00/15] tb hash improvements Alex Bennée
2016-06-08 15:16   ` Emilio G. Cota
2016-06-08 15:35   ` Richard Henderson
2016-06-08 15:37     ` Sergey Fedorov
2016-06-08 16:45       ` Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160607010545.GB4418@flamenco \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).