qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: Richard Henderson <richard.henderson@linaro.org>,
	David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-devel@nongnu.org, groug@kaod.org, qemu-ppc@nongnu.org,
	gustavo.romero@linaro.org, clg@kaod.org
Subject: Re: [PATCH 12/19] target/ppc/pmu_book3s_helper.c: enable PMC1 counter negative EBB
Date: Sat, 14 Aug 2021 16:13:29 -0300	[thread overview]
Message-ID: <67c127c0-33b0-abb7-dcb6-2143cc56a192@gmail.com> (raw)
In-Reply-To: <d312903b-7096-1ce6-28d0-5bb3690ae0eb@linaro.org>



On 8/12/21 9:35 PM, Richard Henderson wrote:
> On 8/12/21 11:24 AM, Daniel Henrique Barboza wrote:
>> +void helper_insns_inc(CPUPPCState *env)
>> +{
>> +    env->pmu_insns_count++;
>> +}
>> +
>> +void helper_insns_dec(CPUPPCState *env)
>> +{
>> +    env->pmu_insns_count--;
>> +}
>> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
>> index 60f35360b7..c56c656694 100644
>> --- a/target/ppc/translate.c
>> +++ b/target/ppc/translate.c
>> @@ -8689,6 +8689,7 @@ static void ppc_tr_tb_start(DisasContextBase *db, CPUState *cs)
>>
>>   static void ppc_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
>>   {
>> +    gen_helper_insns_inc(cpu_env);
>>       tcg_gen_insn_start(dcbase->pc_next);
>>   }
>>
>> @@ -8755,6 +8756,8 @@ static void ppc_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
>>           return;
>>       }
>>
>> +    gen_helper_insns_dec(cpu_env);
>> +
>>       /* Honor single stepping. */
>>       sse = ctx->singlestep_enabled & (CPU_SINGLE_STEP | GDBSTUB_SINGLE_STEP);
>>       if (unlikely(sse)) {
>>
>>
>> And then I used 'env->pmu_insns_count' to update the instruction counters. And it's
>> working, with a slightly better performance than we have with the icount()
>> version. I'm not sure why it's working now since I tried something very similar
>> before and it didn't. Figures.
> 
> Not sure why you're decrementing there.
> 
> Also, how do you want to count retired instructions? Ones that merely start?  E.g. including loads that fault?  How about illegal instructions?  Or does the instruction need to run to completion?  Which of these you choose affects where you place the increment.

Before I proceed, let me mention that the numbers I'm about to post here are from the powerpc
selftest located here:

https://github.com/torvalds/linux/blob/master/tools/testing/selftests/powerpc/pmu/count_instructions.c

This test runs an instruction loop that consists of 'addi' instructions . Before running the instructions
there's an overhead calculation with an empty loop.


So, the event I want to count is described as "The thread has completed an instruction" in the ISA.
There are no events that counts illegal instructions or that causes fault, so my guess is that
all completed functions regardless of side-effects are counted.

Putting the incrementer in ppc_tr_insn_start() like I mentioned in my previous reply provides results
similar to the icount version:

# ./count_instructions
test: count_instructions
tags: git_version:v5.13-5357-gdbe69e433722-dirty
Binding to cpu 0
main test running as pid 568
Overhead of null loop: 2370 instructions
instructions: result 1002370 running/enabled 1587314
cycles: result 1001372 running/enabled 1348532
Looped for 1000000 instructions, overhead 2370
Expected 1002370
Actual   1002370
Delta    0, 0.000000%
instructions: result 10010309 running/enabled 11603340
cycles: result 10009311 running/enabled 11362216
Looped for 10000000 instructions, overhead 2370
Expected 10002370
Actual   10010309
Delta    7939, 0.079308%
[FAIL] Test FAILED on line 119
failure: count_instructions

I'm aware that putting the increment in insn_start() is too early since the instruction didn't
even complete, so I attempted to put the increment at the end of ppc_tr_translate_insn(). I
ended up with fewer instructions counted for the same test:

# ./count_instructions
test: count_instructions
tags: git_version:v5.13-5357-gdbe69e433722-dirty
Binding to cpu 0
main test running as pid 575
Overhead of null loop: 1962 instructions
instructions: result 939462 running/enabled 1587310
cycles: result 1001372 running/enabled 1348532
Looped for 1000000 instructions, overhead 1962
Expected 1001962
Actual   939462
Delta    -62500, -6.652744%
[FAIL] Test FAILED on line 116
failure: count_instructions

Reading translator_loop() I can't explain why  we're missing this amount of functions because
translate_insn() is always called right after insns_start():


     while (true) {
         db->num_insns++;

         /* NOTE: incrementing the counter in insn_start() ppc impl gives similar result as icount*/
         ops->insn_start(db, cpu);
         tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */

        (...)

         /* NOTE: incrementing the counter in translate_insn() PPC impl misses 6% of the instructions,
         even though it's always called right after insn_start() */
         if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
             /* Accept I/O on the last instruction.  */
             gen_io_start();
             ops->translate_insn(db, cpu);
         } else {
             /* we should only see CF_MEMI_ONLY for io_recompile */
             tcg_debug_assert(!(cflags & CF_MEMI_ONLY));
             ops->translate_insn(db, cpu);
         }

I've recompiled QEMU with --enable-tcg-debug to see if tcg_debug_assert() right after insn_start() was
being hit. The assert isn't being hit in the test.


I also tried to read the instructions in tb_stop() via db->num_insns. Since I'm not worried about missing
the counter_overflow a few instructions it would also be an alternative. I changed the helper to receive
a number of instructions to be added instead of single increments:

static void ppc_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
      target_ulong nip = ctx->base.pc_next;
      int sse;
  
+    gen_helper_insns_inc(cpu_env, tcg_constant_i32(dcbase->num_insns));
+
      if (is_jmp == DISAS_NORETURN) {
          /* We have already exited the TB. */
          return;


The test gives us these numbers:

# ./count_instructions
test: count_instructions
tags: git_version:v5.13-5357-gdbe69e433722-dirty
Binding to cpu 0
main test running as pid 573
Overhead of null loop: 85 instructions
instructions: result 85 running/enabled 1587316
cycles: result 1001372 running/enabled 1348534
Looped for 1000000 instructions, overhead 85
Expected 1000085
Actual   85
Delta    -1000000, -1176470.588235%
[FAIL] Test FAILED on line 116
failure: count_instructions

This is the situation I've faced some time ago and commented on a previous email. My hypothesis
back then was that translation_loop(), which is called via gen_intermediate_code() - tb_gen_code(),
was not being executed because tb_gen_code() is always guarded by a tb_lookup() , and
assumed that the lookup was finding the already translated TB and cpu_tb_exec() with it.


Well, we just saw that incrementing the counter in both insn_start() and translate_insn() gives
a far greater number than trying to sum up all the db->num_insns in tb_stop(), and all of
them are called inside translator_loop(). This invalidates (I guess) the assumptions I made
on tb_gen_code() up above. So yeah, I need more time to understand why I'm getting these
numbers.

For now, I'm really tempted into increment the instructions at insn_start() and come back at
instruction counting specifics in a follow-up work.


> 
> It is of course extremely heavyweight to call a helper to perform a simple addition operation.

This increment would update all registers that are counting instructions and
also fire a PMU interrupt when an overflow happens. I also added the frozen
count PMU bit in 'hflags' so we can increment only if the PMU is running.


There's a good chance that most of this can be done outside the helper in a
TCG function in translate.c. That would minimize the impact of the running
PMU in the translation process. I'll see how it goes.


> 
> You may also wish to look at target/hexagon, gen_exec_counters(), which is doing the same sort of thing.  But not completely, since it loses count along dynamic exception paths (e.g. faulting load).  That can be fixed as well, via restore_state_to_opc.

I saw the hexagon code and attempted to accumulate the instructions in
a 'num_insn' ctx variable, that was incremented in insns_start(), and then
update the PMU counters in tb_stop() with it. The results were similar
to what I've described above with db->num_insns in tb_stop().



Thanks,


Daniel


> 
> 
> r~


  reply	other threads:[~2021-08-14 19:14 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-09 13:10 [PATCH 00/19] PMU-EBB support for PPC64 TCG Daniel Henrique Barboza
2021-08-09 13:10 ` [PATCH 01/19] target/ppc: add exclusive Book3S PMU reg read/write functions Daniel Henrique Barboza
2021-08-10  3:19   ` David Gibson
2021-08-10 13:06     ` Daniel Henrique Barboza
2021-08-09 13:10 ` [PATCH 02/19] target/ppc: add exclusive user read function for PMU regs Daniel Henrique Barboza
2021-08-10  3:21   ` David Gibson
2021-08-09 13:10 ` [PATCH 03/19] target/ppc: add exclusive user write " Daniel Henrique Barboza
2021-08-10  3:29   ` David Gibson
2021-08-11  0:05     ` Richard Henderson
2021-08-09 13:10 ` [PATCH 04/19] target/ppc: PMU Book3s basic insns count for pseries TCG Daniel Henrique Barboza
2021-08-10  3:39   ` David Gibson
2021-08-10 13:24     ` Daniel Henrique Barboza
2021-08-16 17:53     ` Daniel Henrique Barboza
2021-08-17  2:59       ` David Gibson
2021-08-17  9:30         ` Daniel Henrique Barboza
2021-08-18  5:48           ` David Gibson
2021-08-11  0:26   ` Richard Henderson
2021-08-09 13:10 ` [PATCH 05/19] target/ppc/pmu_book3s_helper.c: eliminate code repetition Daniel Henrique Barboza
2021-08-10  3:40   ` David Gibson
2021-08-09 13:10 ` [PATCH 06/19] target/ppc/pmu_book3s_helper: enable PMC1-PMC4 events Daniel Henrique Barboza
2021-08-10  3:42   ` David Gibson
2021-08-10 15:03     ` Daniel Henrique Barboza
2021-08-10 23:08       ` Daniel Henrique Barboza
2021-08-11 23:27         ` Daniel Henrique Barboza
2021-08-12  1:52         ` David Gibson
2021-08-11  3:32       ` David Gibson
2021-08-09 13:10 ` [PATCH 07/19] target/ppc/pmu_book3s_helper.c: icount fine tuning Daniel Henrique Barboza
2021-08-09 13:10 ` [PATCH 08/19] target/ppc/pmu_book3s_helper.c: do an actual cycles calculation Daniel Henrique Barboza
2021-08-11  0:34   ` Richard Henderson
2021-08-09 13:10 ` [PATCH 09/19] PPC64/TCG: Implement 'rfebb' instruction Daniel Henrique Barboza
2021-08-10  3:50   ` David Gibson
2021-08-10 19:32     ` Daniel Henrique Barboza
2021-08-11  0:42       ` Richard Henderson
2021-08-11  3:36       ` David Gibson
2021-08-11  0:41   ` Richard Henderson
2021-08-09 13:10 ` [PATCH 10/19] target/ppc: PMU Event-Based exception support Daniel Henrique Barboza
2021-08-10  3:55   ` David Gibson
2021-08-11  0:50   ` Richard Henderson
2021-08-09 13:10 ` [PATCH 11/19] target/ppc/excp_helper.c: POWERPC_EXCP_EBB adjustments Daniel Henrique Barboza
2021-08-09 13:10 ` [PATCH 12/19] target/ppc/pmu_book3s_helper.c: enable PMC1 counter negative EBB Daniel Henrique Barboza
2021-08-10  4:01   ` David Gibson
2021-08-10 20:26     ` Daniel Henrique Barboza
2021-08-11  3:40       ` David Gibson
2021-08-11 11:18         ` Daniel Henrique Barboza
2021-08-12  3:39           ` David Gibson
2021-08-12  4:45             ` Richard Henderson
2021-08-12  4:56               ` Richard Henderson
2021-08-12 10:17                 ` Daniel Henrique Barboza
2021-08-12 21:24                   ` Daniel Henrique Barboza
2021-08-13  0:35                     ` Richard Henderson
2021-08-14 19:13                       ` Daniel Henrique Barboza [this message]
2021-08-15 19:24                         ` Richard Henderson
2021-08-09 13:10 ` [PATCH 13/19] target/ppc/translate: PMU: handle setting of PMCs while running Daniel Henrique Barboza
2021-08-10  4:06   ` David Gibson
2021-08-10 20:44     ` Daniel Henrique Barboza
2021-08-11  3:46       ` David Gibson
2021-08-09 13:10 ` [PATCH 14/19] target/ppc/pmu_book3s_helper.c: add generic timeout helpers Daniel Henrique Barboza
2021-08-10  4:09   ` David Gibson
2021-08-09 13:10 ` [PATCH 15/19] target/ppc/pmu_book3s_helper: enable counter negative for all PMCs Daniel Henrique Barboza
2021-08-10  4:11   ` David Gibson
2021-08-10 21:02     ` Daniel Henrique Barboza
2021-08-12  1:44       ` David Gibson
2021-08-09 13:10 ` [PATCH 16/19] target/ppc/pmu_book3s_helper: adding 0xFA event Daniel Henrique Barboza
2021-08-10  4:13   ` David Gibson
2021-08-09 13:10 ` [PATCH 17/19] target/ppc/pmu_book3s_helper.c: add PMC14/PMC56 counter freeze bits Daniel Henrique Barboza
2021-08-09 13:10 ` [PATCH 18/19] target/ppc/pmu_book3s_helper.c: add PM_CMPLU_STALL mock events Daniel Henrique Barboza
2021-08-10  4:17   ` David Gibson
2021-08-10 19:48     ` Daniel Henrique Barboza
2021-08-11  3:37       ` David Gibson
2021-08-09 13:10 ` [PATCH 19/19] docs/specs: add PPC64 TCG PMU-EBB documentation Daniel Henrique Barboza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67c127c0-33b0-abb7-dcb6-2143cc56a192@gmail.com \
    --to=danielhb413@gmail.com \
    --cc=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=gustavo.romero@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).