From: "Emilio G. Cota" <cota@braap.org>
To: Sergey Fedorov <serge.fdrv@gmail.com>
Cc: "QEMU Developers" <qemu-devel@nongnu.org>,
"MTTCG Devel" <mttcg@listserver.greensocs.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data
Date: Wed, 8 Jun 2016 14:06:45 -0400 [thread overview]
Message-ID: <20160608180645.GA14106@flamenco> (raw)
In-Reply-To: <5758273B.7090603@gmail.com>
On Wed, Jun 08, 2016 at 17:10:03 +0300, Sergey Fedorov wrote:
> On 08/06/16 03:02, Emilio G. Cota wrote:
> > - dist->entries = g_realloc(dist->entries,
> > - sizeof(*dist->entries) * (dist->n + 1));
> > + if (unlikely(dist->n == dist->size)) {
> > + dist->size = dist->size ? dist->size * 2 : 1;
>
> We could initialize 'dist->size' to 1 and allocate a 1-entry
> 'dist->entries' array in qdist_init() to avoid this ternary operation ;-)
Done. This resulted in quite a few modifications, since dist->entries == NULL
had been used as an equivalent to dist->n == 0.
> >>>> (snip)
> >> So our scale is not linear. I think some users might get confused by this.
> > That's correct. I think special-casing 0 makes sense though, since
> > it increases the signal-to-noise ratio of the histogram. For example:
> >
> > 1) 0 as ' ':
> > TB hash occupancy 31.84% avg chain occ. Histogram: [0,10)%|▆ █ ▅▁▃▁▁|[90,100]%
> > TB hash avg chain 1.015 buckets. Histogram: 1|█▁▁|3
> >
> > 2) 0 as '1/8':
> > TB hash occupancy 32.07% avg chain occ. Histogram: [0,10)%|▆▁█▁▁▅▁▃▁▁|[90,100]%
> > TB hash avg chain 1.015 buckets. Histogram: 1|▇▁▁|3
> >
> > I think in these examples most users would be less confused by 1) than by 2).
>
> I was meaning to represent all bars whose value < 1/8 as a space, not
> only whose value is pure zero. Otherwise we can see 1/8 bar where the
> actual value is negligibly differ from zero as in the second example.
I see. That would be (3):
TB hash occupancy 32.79% avg chain occ. Histogram: [0,10)%|▅ █ ▅ ▂ |[90,100]%
TB hash avg chain 1.017 buckets. Histogram: 1|█ |3
I still think (1) is the representation that gives the most information.
IMO it's valuable that "close to zero" and "zero" are represented differently,
in the same way that max and "close to max" are represented differently
as well (only max gets 8/8).
BTW, while looking into this I fixed a bug; sometimes we'd print 7/8 instead
of 8/8 for the max value, due to the ordering of FP computations
[see 1-3 in (2) above; it's 7/8 instead of 8/8]. Fixed with:
diff --git a/util/qdist.c b/util/qdist.c
index cfe09e6..7842d34 100644
--- a/util/qdist.c
+++ b/util/qdist.c
@@ -103,7 +103,7 @@ static const gunichar qdist_blocks[] = {
*/
static char *qdist_pr_internal(const struct qdist *dist)
{
- double min, max, step;
+ double min, max;
GString *s = g_string_new("");
size_t i;
@@ -131,16 +131,14 @@ static char *qdist_pr_internal(const struct qdist *dist)
}
}
- /* floor((count - min) * step) will give us the block index */
- step = (QDIST_NR_BLOCK_CODES - 1) / (max - min);
-
for (i = 0; i < dist->n; i++) {
struct qdist_entry *e = &dist->entries[i];
int index;
/* make an exception with 0; instead of using block[0], print a space */
if (e->count) {
- index = (int)((e->count - min) * step);
+ /* divide first to avoid loss of precision when e->count == max */
+ index = (e->count - min) / (max - min) * (QDIST_NR_BLOCK_CODES - 1);
g_string_append_unichar(s, qdist_blocks[index]);
} else {
g_string_append_c(s, ' ');
I also added a test to test-qdist (called "test_bin_precision") that
checks for this.
> > The behaviour isn't the same though. With this we have
> > that the two outer bins (leftmost and rightmost) are unnecessarily
> > large (since they're out of the range of the input data).
> >
> > For example, assume the data is between 0 and 100 and n=5 (i.e. step=25),
> > it makes no sense to report the first bin as [-12.5,12.5). If we
> > then truncate the unnecessary edges, we'd have [0,12.5), but
> > then the second bin is [12.5,37.5). Bins of unequal size are
> > possible (although a bit unusual) in histograms, but given
> > our Unicode-based representation, we're limited to same-width bars.
>
> That is why I noted that I'm not sure what is the most correct from
> mathematical point of view. Maybe consider the second option? I.e.
> rounding to the middle of each bin with:
>
> x = left + step / 2;
>
> which would give the picture like this:
>
>
> xmin [*---*---*---*---*] xmax -- from
> | | | | |
> \ / \ / \ / \ /
> | | | |
> V V V V
> [* * * *] -- to
This binning is equivalent to what we do right now.
The only difference is where the value is set (either at the left
of the bin, or at the center as above); this, however, isn't too
important, since this value is only used when printing
the labels, i.e. we could print [left, left+step) or
[center-step/2, center+step/2) and still get the same results.
> Anyway, you may consider if you like whether it's possible to apply some
> simplifications from my code to the final version.
OK. This is how it looks like:
diff --git a/util/qdist.c b/util/qdist.c
index 7842d34..3ca2227 100644
--- a/util/qdist.c
+++ b/util/qdist.c
@@ -163,7 +163,7 @@ void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n)
{
double xmin, xmax;
double step;
- size_t i, j, j_min;
+ size_t i, j;
qdist_init(to);
@@ -194,7 +194,7 @@ void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n)
}
rebin:
- j_min = 0;
+ j = 0;
for (i = 0; i < n; i++) {
double x;
double left, right;
@@ -210,19 +210,13 @@ void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n)
* To avoid double-counting we capture [left, right) ranges, except for
* the righmost bin, which captures a [left, right] range.
*/
- for (j = j_min; j < from->n; j++) {
+ while (j < from->n &&
+ (from->entries[j].x < right ||
+ (i == n - 1 && from->entries[j].x == right))) {
struct qdist_entry *o = &from->entries[j];
- /* entries are ordered so do not check beyond right */
- if (o->x > right) {
- break;
- }
- if (o->x >= left && (o->x < right ||
- (i == n - 1 && o->x == right))) {
- qdist_add(to, x, o->count);
- /* don't check this entry again */
- j_min = j + 1;
- }
+ qdist_add(to, x, o->count);
+ j++;
}
}
}
Thanks,
Emilio
next prev parent reply other threads:[~2016-06-08 18:07 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-25 1:13 [Qemu-devel] [PATCH v6 00/15] tb hash improvements Emilio G. Cota
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 01/15] compiler.h: add QEMU_ALIGNED() to enforce struct alignment Emilio G. Cota
2016-05-27 19:54 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 02/15] seqlock: remove optional mutex Emilio G. Cota
2016-05-27 19:55 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 03/15] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-05-27 19:59 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 04/15] include/processor.h: define cpu_relax() Emilio G. Cota
2016-05-27 20:53 ` Sergey Fedorov
2016-05-27 21:10 ` Emilio G. Cota
2016-05-28 12:35 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 05/15] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 06/15] exec: add tb_hash_func5, derived from xxhash Emilio G. Cota
2016-05-28 12:36 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 07/15] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-05-28 12:39 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data Emilio G. Cota
2016-05-28 18:15 ` Sergey Fedorov
2016-06-03 17:22 ` Emilio G. Cota
2016-06-03 17:29 ` Sergey Fedorov
2016-06-03 17:46 ` Sergey Fedorov
2016-06-06 23:40 ` Emilio G. Cota
2016-06-07 14:06 ` Sergey Fedorov
2016-06-07 22:53 ` Emilio G. Cota
2016-06-08 13:09 ` Sergey Fedorov
2016-06-07 1:05 ` Emilio G. Cota
2016-06-07 15:56 ` Sergey Fedorov
2016-06-08 0:02 ` Emilio G. Cota
2016-06-08 14:10 ` Sergey Fedorov
2016-06-08 18:06 ` Emilio G. Cota [this message]
2016-06-08 18:18 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 09/15] qdist: add test program Emilio G. Cota
2016-05-28 18:56 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 10/15] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-05-29 19:52 ` Sergey Fedorov
2016-05-29 19:55 ` Sergey Fedorov
2016-05-31 7:46 ` Alex Bennée
2016-06-01 20:53 ` Sergey Fedorov
2016-06-03 9:18 ` Emilio G. Cota
2016-06-03 15:19 ` Sergey Fedorov
2016-06-03 11:01 ` Emilio G. Cota
2016-06-03 15:34 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 11/15] qht: add test program Emilio G. Cota
2016-05-29 20:15 ` Sergey Fedorov
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 12/15] qht: add qht-bench, a performance benchmark Emilio G. Cota
2016-05-29 20:45 ` Sergey Fedorov
2016-06-03 11:41 ` Emilio G. Cota
2016-06-03 15:41 ` Sergey Fedorov
2016-05-31 15:12 ` Alex Bennée
2016-05-31 16:44 ` Emilio G. Cota
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 13/15] qht: add test-qht-par to invoke qht-bench from 'check' target Emilio G. Cota
2016-05-29 20:53 ` Sergey Fedorov
2016-06-03 11:07 ` Emilio G. Cota
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 14/15] tb hash: track translated blocks with qht Emilio G. Cota
2016-05-29 21:09 ` Sergey Fedorov
2016-05-31 8:39 ` Alex Bennée
2016-05-25 1:13 ` [Qemu-devel] [PATCH v6 15/15] translate-all: add tb hash bucket info to 'info jit' dump Emilio G. Cota
2016-05-29 21:14 ` Sergey Fedorov
2016-06-08 6:25 ` [Qemu-devel] [PATCH v6 00/15] tb hash improvements Alex Bennée
2016-06-08 15:16 ` Emilio G. Cota
2016-06-08 15:35 ` Richard Henderson
2016-06-08 15:37 ` Sergey Fedorov
2016-06-08 16:45 ` Alex Bennée
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160608180645.GA14106@flamenco \
--to=cota@braap.org \
--cc=alex.bennee@linaro.org \
--cc=mttcg@listserver.greensocs.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=serge.fdrv@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).