From: Sergey Fedorov <serge.fdrv@gmail.com>
To: "Emilio G. Cota" <cota@braap.org>
Cc: "QEMU Developers" <qemu-devel@nongnu.org>,
"MTTCG Devel" <mttcg@listserver.greensocs.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data
Date: Tue, 7 Jun 2016 18:56:48 +0300 [thread overview]
Message-ID: <5756EEC0.8090502@gmail.com> (raw)
In-Reply-To: <20160607010545.GB4418@flamenco>
On 07/06/16 04:05, Emilio G. Cota wrote:
> On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote:
>> On 25/05/16 04:13, Emilio G. Cota wrote:
>>> diff --git a/util/qdist.c b/util/qdist.c
>>> new file mode 100644
>>> index 0000000..3343640
>>> --- /dev/null
>>> +++ b/util/qdist.c
>>> @@ -0,0 +1,386 @@
>> (snip)
>>> +
>>> +void qdist_add(struct qdist *dist, double x, long count)
>>> +{
>>> + struct qdist_entry *entry = NULL;
>>> +
>>> + if (dist->entries) {
>>> + struct qdist_entry e;
>>> +
>>> + e.x = x;
>>> + entry = bsearch(&e, dist->entries, dist->n, sizeof(e), qdist_cmp);
>>> + }
>>> +
>>> + if (entry) {
>>> + entry->count += count;
>>> + return;
>>> + }
>>> +
>>> + dist->entries = g_realloc(dist->entries,
>>> + sizeof(*dist->entries) * (dist->n + 1));
>> Repeated doubling?
> Can you please elaborate?
I mean dynamic array with a growth factor of 2
[https://en.wikipedia.org/wiki/Dynamic_array].
>
>>> + dist->n++;
>>> + entry = &dist->entries[dist->n - 1];
>> What if we combine the above two lines:
>>
>> entry = &dist->entries[dist->n++];
>>
>> or just reverse them:
>>
>> entry = &dist->entries[dist->n];
>> dist->n++;
> I have less trouble understanding the original.
Okay.
>
>>> + entry->x = x;
>>> + entry->count = count;
>>> + qsort(dist->entries, dist->n, sizeof(*entry), qdist_cmp);
>>> +}
>>> +
>> (snip)
>>> +static char *qdist_pr_internal(const struct qdist *dist)
>>> +{
>>> + double min, max, step;
>>> + GString *s = g_string_new("");
>>> + size_t i;
>>> +
>>> + /* if only one entry, its printout will be either full or empty */
>>> + if (dist->n == 1) {
>>> + if (dist->entries[0].count) {
>>> + g_string_append_unichar(s, qdist_blocks[QDIST_NR_BLOCK_CODES - 1]);
>>> + } else {
>>> + g_string_append_c(s, ' ');
>>> + }
>>> + goto out;
>>> + }
>>> +
>>> + /* get min and max counts */
>>> + min = dist->entries[0].count;
>>> + max = min;
>>> + for (i = 0; i < dist->n; i++) {
>>> + struct qdist_entry *e = &dist->entries[i];
>>> +
>>> + if (e->count < min) {
>>> + min = e->count;
>>> + }
>>> + if (e->count > max) {
>>> + max = e->count;
>>> + }
>>> + }
>>> +
>>> + /* floor((count - min) * step) will give us the block index */
>>> + step = (QDIST_NR_BLOCK_CODES - 1) / (max - min);
>>> +
>>> + for (i = 0; i < dist->n; i++) {
>>> + struct qdist_entry *e = &dist->entries[i];
>>> + int index;
>>> +
>>> + /* make an exception with 0; instead of using block[0], print a space */
>>> + if (e->count) {
>>> + index = (int)((e->count - min) * step);
>> So "e->count == min" gives us one eighth block instead of just space?
> Yes, only 0 can print a space.
So our scale is not linear. I think some users might get confused by this.
>
>>> + g_string_append_unichar(s, qdist_blocks[index]);
>>> + } else {
>>> + g_string_append_c(s, ' ');
>>> + }
>>> + }
>>> + out:
>>> + return g_string_free(s, FALSE);
>>> +}
>>> +
>>> +/*
>>> + * Bin the distribution in @from into @n bins of consecutive, non-overlapping
>>> + * intervals, copying the result to @to.
>>> + *
>>> + * This function is internal to qdist: only this file and test code should
>>> + * ever call it.
>>> + *
>>> + * Note: calling this function on an already-binned qdist is a bug.
>>> + *
>>> + * If @n == 0 or @from->n == 1, use @from->n.
>>> + */
>>> +void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n)
>>> +{
>>> + double xmin, xmax;
>>> + double step;
>>> + size_t i, j, j_min;
>>> +
>>> + qdist_init(to);
>>> +
>>> + if (!from->entries) {
>>> + return;
>>> + }
>>> + if (!n || from->n == 1) {
>>> + n = from->n;
>>> + }
>>> +
>>> + /* set equally-sized bins between @from's left and right */
>>> + xmin = qdist_xmin(from);
>>> + xmax = qdist_xmax(from);
>>> + step = (xmax - xmin) / n;
>>> +
>>> + if (n == from->n) {
>>> + /* if @from's entries are equally spaced, no need to re-bin */
>>> + for (i = 0; i < from->n; i++) {
>>> + if (from->entries[i].x != xmin + i * step) {
>>> + goto rebin;
>> static inline function instead of goto?
> It would have quite a few arguments, I think the goto is fine.
Actually, it would be 'xmin', 'xmax', and 'step' in addition to 'to',
'from', and 'n'. But yes, probably goto is fine here.
>
>>> + }
>>> + }
>>> + /* they're equally spaced, so copy the dist and bail out */
>>> + to->entries = g_malloc(sizeof(*to->entries) * from->n);
>> g_new()?
> Changed.
>
>>> + to->n = from->n;
>>> + memcpy(to->entries, from->entries, sizeof(*to->entries) * to->n);
>>> + return;
>>> + }
>>> +
>>> + rebin:
By the way, here's a space before the 'rebin' label.
>>> + j_min = 0;
>>> + for (i = 0; i < n; i++) {
>>> + double x;
>>> + double left, right;
>>> +
>>> + left = xmin + i * step;
>>> + right = xmin + (i + 1) * step;
>>> +
>>> + /* Add x, even if it might not get any counts later */
>>> + x = left;
>> This way we round down to the left margin of each bin like this:
>>
>> xmin [*---*---*---*---*] xmax -- from
>> | /| /| /| /
>> | / | / | / | /
>> |/ |/ |/ |/
>> | | | |
>> V V V V
>> [* * * *] -- to
> (snip)
>> xmin [*----*----*----*] xmax -- from
>> \ /\ /\ /\ /
>> \ / \ / \ / \ /
>> | | | |
>> V V V V
>> [* * * *] -- to
>>
>> I'm not sure which is the more correct option from the mathematical
>> point of view; but multiple-binning with the last variant of the
>> algorithm we would still give the same result.
> There's no "right" or "wrong" way as long as we're consistent
> and we print the right counts in the right bins. I think the
> convention I chose is simple enough, and leads to simple printing
> of the labels. But yes other alternatives would be OK here.
Well, if we go ahead with my last suggestion the code would look like this:
rebin:
/* We do the binning using the following scheme:
*
* xmin [*----*----*----*] xmax -- from
* \ /\ /\ /\ /
* \ / \ / \ / \ /
* | | | |
* V V V V
* [* * * *] -- to
*
*/
step = (xmax - xmin) / (n - 1);
j = 0;
for (i = 0; i < n; i++) {
double x;
double right;
x = xmin + i * step;
right = x + 0.5 * step;
/* Add x, even if it might not get any counts later */
qdist_add(to, x, 0);
/* To avoid double-counting we capture [left, right) ranges */
while (from->entries[j].x < right && j < from->n) {
qdist_add(to, x, from->entries[j].count);
j++;
}
}
assert(j == from->n);
}
Actually it's simpler than current version.
Kind regards,
Sergey
next prev parent reply other threads:[~2016-06-07 15:56 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-25 1:13 [Qemu-devel] [PATCH v6 00/15] tb hash improvements Emilio G. Cota
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 01/15] compiler.h: add QEMU_ALIGNED() to enforce struct alignment Emilio G. Cota
2016-05-27 19:54 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 02/15] seqlock: remove optional mutex Emilio G. Cota
2016-05-27 19:55 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 03/15] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-05-27 19:59 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 04/15] include/processor.h: define cpu_relax() Emilio G. Cota
2016-05-27 20:53 ` Sergey Fedorov
2016-05-27 21:10 ` Emilio G. Cota
2016-05-28 12:35 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 05/15] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 06/15] exec: add tb_hash_func5, derived from xxhash Emilio G. Cota
2016-05-28 12:36 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 07/15] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-05-28 12:39 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data Emilio G. Cota
2016-05-28 18:15 ` Sergey Fedorov
2016-06-03 17:22 ` Emilio G. Cota
2016-06-03 17:29 ` Sergey Fedorov
2016-06-03 17:46 ` Sergey Fedorov
2016-06-06 23:40 ` Emilio G. Cota
2016-06-07 14:06 ` Sergey Fedorov
2016-06-07 22:53 ` Emilio G. Cota
2016-06-08 13:09 ` Sergey Fedorov
2016-06-07 1:05 ` Emilio G. Cota
2016-06-07 15:56 ` Sergey Fedorov [this message]
2016-06-08 0:02 ` Emilio G. Cota
2016-06-08 14:10 ` Sergey Fedorov
2016-06-08 18:06 ` Emilio G. Cota
2016-06-08 18:18 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 09/15] qdist: add test program Emilio G. Cota
2016-05-28 18:56 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 10/15] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-05-29 19:52 ` Sergey Fedorov
2016-05-29 19:55 ` Sergey Fedorov
2016-05-31 7:46 ` Alex Bennée
2016-06-01 20:53 ` Sergey Fedorov
2016-06-03 9:18 ` Emilio G. Cota
2016-06-03 15:19 ` Sergey Fedorov
2016-06-03 11:01 ` Emilio G. Cota
2016-06-03 15:34 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 11/15] qht: add test program Emilio G. Cota
2016-05-29 20:15 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 12/15] qht: add qht-bench, a performance benchmark Emilio G. Cota
2016-05-29 20:45 ` Sergey Fedorov
2016-06-03 11:41 ` Emilio G. Cota
2016-06-03 15:41 ` Sergey Fedorov
2016-05-31 15:12 ` Alex Bennée
2016-05-31 16:44 ` Emilio G. Cota
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 13/15] qht: add test-qht-par to invoke qht-bench from 'check' target Emilio G. Cota
2016-05-29 20:53 ` Sergey Fedorov
2016-06-03 11:07 ` Emilio G. Cota
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 14/15] tb hash: track translated blocks with qht Emilio G. Cota
2016-05-29 21:09 ` Sergey Fedorov
2016-05-31 8:39 ` Alex Bennée
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 15/15] translate-all: add tb hash bucket info to 'info jit' dump Emilio G. Cota
2016-05-29 21:14 ` Sergey Fedorov
2016-06-08 6:25 ` [Qemu-devel] [PATCH v6 00/15] tb hash improvements Alex Bennée
2016-06-08 15:16 ` Emilio G. Cota
2016-06-08 15:35 ` Richard Henderson
2016-06-08 15:37 ` Sergey Fedorov
2016-06-08 16:45 ` Alex Bennée
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5756EEC0.8090502@gmail.com \
--to=serge.fdrv@gmail.com \
--cc=alex.bennee@linaro.org \
--cc=cota@braap.org \
--cc=mttcg@listserver.greensocs.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).