Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Sergey Fedorov <serge.fdrv@gmail.com>
To: "Emilio G. Cota" <cota@braap.org>
Cc: "QEMU Developers" <qemu-devel@nongnu.org>,
	"MTTCG Devel" <mttcg@listserver.greensocs.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data
Date: Wed, 8 Jun 2016 21:18:22 +0300	[thread overview]
Message-ID: <5758616E.8070202@gmail.com> (raw)
In-Reply-To: <20160608180645.GA14106@flamenco>

On 08/06/16 21:06, Emilio G. Cota wrote:
> On Wed, Jun 08, 2016 at 17:10:03 +0300, Sergey Fedorov wrote:
>> On 08/06/16 03:02, Emilio G. Cota wrote:
>>> -    dist->entries = g_realloc(dist->entries,
>>> -                              sizeof(*dist->entries) * (dist->n + 1));
>>> +    if (unlikely(dist->n == dist->size)) {
>>> +        dist->size = dist->size ? dist->size * 2 : 1;
>> We could initialize 'dist->size' to 1 and allocate a 1-entry
>> 'dist->entries' array in qdist_init() to avoid this ternary operation ;-)
> Done. This resulted in quite a few modifications, since dist->entries == NULL
> had been used as an equivalent to dist->n == 0.
>
>>>>>> (snip)
>>>> So our scale is not linear. I think some users might get confused by this.
>>> That's correct. I think special-casing 0 makes sense though, since
>>> it increases the signal-to-noise ratio of the histogram. For example:
>>>
>>> 1) 0 as ' ':
>>> TB hash occupancy   31.84% avg chain occ. Histogram: [0,10)%|▆ █  ▅▁▃▁▁|[90,100]%
>>> TB hash avg chain   1.015 buckets. Histogram: 1|█▁▁|3
>>>
>>> 2) 0 as '1/8':
>>> TB hash occupancy   32.07% avg chain occ. Histogram: [0,10)%|▆▁█▁▁▅▁▃▁▁|[90,100]%
>>> TB hash avg chain   1.015 buckets. Histogram: 1|▇▁▁|3
>>>
>>> I think in these examples most users would be less confused by 1) than by 2).
>> I was meaning to represent all bars whose value < 1/8 as a space, not
>> only whose value is pure zero. Otherwise we can see 1/8 bar where the
>> actual value is negligibly differ from zero as in the second example.
> I see. That would be (3):
>
> TB hash occupancy   32.79% avg chain occ. Histogram: [0,10)%|▅ █  ▅ ▂  |[90,100]%
> TB hash avg chain   1.017 buckets. Histogram: 1|█  |3
>
> I still think (1) is the representation that gives the most information.
> IMO it's valuable that "close to zero" and "zero" are represented differently,
> in the same way that max and "close to max" are represented differently
> as well (only max gets 8/8).

Anyhow, these histograms are only for rough estimation and to see some
nice unicode output :)

>
> BTW, while looking into this I fixed a bug; sometimes we'd print 7/8 instead
> of 8/8 for the max value, due to the ordering of FP computations
> [see 1-3 in (2) above; it's 7/8 instead of 8/8]. Fixed with:
>
> diff --git a/util/qdist.c b/util/qdist.c
> index cfe09e6..7842d34 100644
> --- a/util/qdist.c
> +++ b/util/qdist.c
> @@ -103,7 +103,7 @@ static const gunichar qdist_blocks[] = {
>   */
>  static char *qdist_pr_internal(const struct qdist *dist)
>  {
> -    double min, max, step;
> +    double min, max;
>      GString *s = g_string_new("");
>      size_t i;
>  
> @@ -131,16 +131,14 @@ static char *qdist_pr_internal(const struct qdist *dist)
>          }
>      }
>  
> -    /* floor((count - min) * step) will give us the block index */
> -    step = (QDIST_NR_BLOCK_CODES - 1) / (max - min);
> -
>      for (i = 0; i < dist->n; i++) {
>          struct qdist_entry *e = &dist->entries[i];
>          int index;
>  
>          /* make an exception with 0; instead of using block[0], print a space */
>          if (e->count) {
> -            index = (int)((e->count - min) * step);
> +            /* divide first to avoid loss of precision when e->count == max */
> +            index = (e->count - min) / (max - min) * (QDIST_NR_BLOCK_CODES - 1);
>              g_string_append_unichar(s, qdist_blocks[index]);
>          } else {
>              g_string_append_c(s, ' ');
>
> I also added a test to test-qdist (called "test_bin_precision") that
> checks for this.
>
>>> The behaviour isn't the same though. With this we have
>>> that the two outer bins (leftmost and rightmost) are unnecessarily
>>> large (since they're out of the range of the input data).
>>>
>>> For example, assume the data is between 0 and 100 and n=5 (i.e. step=25),
>>> it makes no sense to report the first bin as [-12.5,12.5). If we
>>> then truncate the unnecessary edges, we'd have [0,12.5), but
>>> then the second bin is [12.5,37.5). Bins of unequal size are
>>> possible (although a bit unusual) in histograms, but given
>>> our Unicode-based representation, we're limited to same-width bars.
>> That is why I noted that I'm not sure what is the most correct from
>> mathematical point of view. Maybe consider the second option? I.e.
>> rounding to the middle of each bin with:
>>
>>     x = left + step / 2;
>>
>> which would give the picture like this:
>>
>>
>>     xmin [*---*---*---*---*] xmax   -- from
>>           |   |   |   |   |
>>            \ / \ / \ / \ /
>>             |   |   |   |
>>             V   V   V   V
>>            [*   *   *   *]          -- to
> This binning is equivalent to what we do right now.
>
> The only difference is where the value is set (either at the left
> of the bin, or at the center as above); this, however, isn't too
> important, since this value is only used when printing
> the labels, i.e. we could print [left, left+step) or
> [center-step/2, center+step/2) and still get the same results.
>
>> Anyway, you may consider if you like whether it's possible to apply some
>> simplifications from my code to the final version.
> OK. This is how it looks like:
>
> diff --git a/util/qdist.c b/util/qdist.c
> index 7842d34..3ca2227 100644
> --- a/util/qdist.c
> +++ b/util/qdist.c
> @@ -163,7 +163,7 @@ void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n)
>  {
>      double xmin, xmax;
>      double step;
> -    size_t i, j, j_min;
> +    size_t i, j;
>  
>      qdist_init(to);
>  
> @@ -194,7 +194,7 @@ void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n)
>      }
>  
>   rebin:
> -    j_min = 0;
> +    j = 0;
>      for (i = 0; i < n; i++) {
>          double x;
>          double left, right;
> @@ -210,19 +210,13 @@ void qdist_bin__internal(struct qdist *to, const struct qdist *from, size_t n)
>           * To avoid double-counting we capture [left, right) ranges, except for
>           * the righmost bin, which captures a [left, right] range.
>           */
> -        for (j = j_min; j < from->n; j++) {
> +        while (j < from->n &&
> +               (from->entries[j].x < right ||
> +                (i == n - 1 && from->entries[j].x == right))) {

If "i == n - 1" then we have to put everything what's left in 'from',
right? If so, then:

while (j < from->n && (from->entries[j].x < right || i == n - 1)) {

Kind regards,
Sergey

>              struct qdist_entry *o = &from->entries[j];
>  
> -            /* entries are ordered so do not check beyond right */
> -            if (o->x > right) {
> -                break;
> -            }
> -            if (o->x >= left && (o->x < right ||
> -                                   (i == n - 1 && o->x == right))) {
> -                qdist_add(to, x, o->count);
> -                /* don't check this entry again */
> -                j_min = j + 1;
> -            }
> +            qdist_add(to, x, o->count);
> +            j++;
>          }
>      }
>  }
>
> Thanks,
>
> 		Emilio

next prev parent reply	other threads:[~2016-06-08 18:18 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-25  1:13 [Qemu-devel] [PATCH v6 00/15] tb hash improvements Emilio G. Cota
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 01/15] compiler.h: add QEMU_ALIGNED() to enforce struct alignment Emilio G. Cota
2016-05-27 19:54   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 02/15] seqlock: remove optional mutex Emilio G. Cota
2016-05-27 19:55   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 03/15] seqlock: rename write_lock/unlock to write_begin/end Emilio G. Cota
2016-05-27 19:59   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 04/15] include/processor.h: define cpu_relax() Emilio G. Cota
2016-05-27 20:53   ` Sergey Fedorov
2016-05-27 21:10     ` Emilio G. Cota
2016-05-28 12:35       ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 05/15] qemu-thread: add simple test-and-set spinlock Emilio G. Cota
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 06/15] exec: add tb_hash_func5, derived from xxhash Emilio G. Cota
2016-05-28 12:36   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 07/15] tb hash: hash phys_pc, pc, and flags with xxhash Emilio G. Cota
2016-05-28 12:39   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data Emilio G. Cota
2016-05-28 18:15   ` Sergey Fedorov
2016-06-03 17:22     ` Emilio G. Cota
2016-06-03 17:29       ` Sergey Fedorov
2016-06-03 17:46         ` Sergey Fedorov
2016-06-06 23:40           ` Emilio G. Cota
2016-06-07 14:06             ` Sergey Fedorov
2016-06-07 22:53               ` Emilio G. Cota
2016-06-08 13:09                 ` Sergey Fedorov
2016-06-07  1:05     ` Emilio G. Cota
2016-06-07 15:56       ` Sergey Fedorov
2016-06-08  0:02         ` Emilio G. Cota
2016-06-08 14:10           ` Sergey Fedorov
2016-06-08 18:06             ` Emilio G. Cota
2016-06-08 18:18               ` Sergey Fedorov [this message]
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 09/15] qdist: add test program Emilio G. Cota
2016-05-28 18:56   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 10/15] qht: QEMU's fast, resizable and scalable Hash Table Emilio G. Cota
2016-05-29 19:52   ` Sergey Fedorov
2016-05-29 19:55     ` Sergey Fedorov
2016-05-31  7:46     ` Alex Bennée
2016-06-01 20:53       ` Sergey Fedorov
2016-06-03  9:18     ` Emilio G. Cota
2016-06-03 15:19       ` Sergey Fedorov
2016-06-03 11:01     ` Emilio G. Cota
2016-06-03 15:34       ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 11/15] qht: add test program Emilio G. Cota
2016-05-29 20:15   ` Sergey Fedorov
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 12/15] qht: add qht-bench, a performance benchmark Emilio G. Cota
2016-05-29 20:45   ` Sergey Fedorov
2016-06-03 11:41     ` Emilio G. Cota
2016-06-03 15:41       ` Sergey Fedorov
2016-05-31 15:12   ` Alex Bennée
2016-05-31 16:44     ` Emilio G. Cota
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 13/15] qht: add test-qht-par to invoke qht-bench from 'check' target Emilio G. Cota
2016-05-29 20:53   ` Sergey Fedorov
2016-06-03 11:07     ` Emilio G. Cota
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 14/15] tb hash: track translated blocks with qht Emilio G. Cota
2016-05-29 21:09   ` Sergey Fedorov
2016-05-31  8:39   ` Alex Bennée
2016-05-25  1:13 ` [Qemu-devel] [PATCH v6 15/15] translate-all: add tb hash bucket info to 'info jit' dump Emilio G. Cota
2016-05-29 21:14   ` Sergey Fedorov
2016-06-08  6:25 ` [Qemu-devel] [PATCH v6 00/15] tb hash improvements Alex Bennée
2016-06-08 15:16   ` Emilio G. Cota
2016-06-08 15:35   ` Richard Henderson
2016-06-08 15:37     ` Sergey Fedorov
2016-06-08 16:45       ` Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5758616E.8070202@gmail.com \
    --to=serge.fdrv@gmail.com \
    --cc=alex.bennee@linaro.org \
    --cc=cota@braap.org \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).