qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] 2x-3x tcg performance regression on ppc64 (maybe others)
@ 2019-02-01 21:53 Cleber Rosa
  2019-02-02  0:09 ` Richard Henderson
  2019-02-05  8:37 ` [Qemu-devel] [Qemu-ppc] " Howard Spoelstra
  0 siblings, 2 replies; 7+ messages in thread
From: Cleber Rosa @ 2019-02-01 21:53 UTC (permalink / raw)
  To: qemu-devel@nongnu.org Developers, Peter Maydell
  Cc: Alex Bennée, qemu-ppc, Richard Henderson

While working on the "boot_linux_console.py" tests, after a given
rebase, I noticed that the ppc64 specific test began timing out.  The
original timeout set to the test was 60 seconds, and it "always" had
room to spare when running either on my system, or on Travis CI.

Then, Alex mentioned that specific test timing out on his "slow"
system[1].  I did some further investigation and found out that commit
f7b78602f might have been responsible for a significant slowdown,
and maybe was also affecting his execution.

Getting straight to the point, this is the ppc64 "boot linux console"
test, running 10 times, on top my latest rebase (e8977901b, that
includes f7b78602f).

 ...
 (01/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(63.07 s)
 (02/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(64.00 s)
 (03/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(62.93 s)
 (04/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(64.01 s)
 (05/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(63.13 s)
 (06/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(64.65 s)
 (07/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(63.25 s)
 (08/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(66.77 s)
 (09/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(65.07 s)
 (10/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(63.55 s)
 ...

The exact branch I used for the results above can be seen at:
  - https://github.com/clebergnu/qemu/tree/regression_tcg_ppc64

Then, with f7b78602f reverted:

 (01/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(20.54 s)
 (02/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(21.06 s)
 (03/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(20.81 s)
 (04/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(21.00 s)
 (05/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(20.37 s)
 (06/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(20.40 s)
 (07/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(20.36 s)
 (08/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(20.39 s)
 (09/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(20.39 s)
 (10/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
(20.38 s)

The exact branch I used for the results above can be seen at:
  - https://github.com/clebergnu/qemu/tree/regression_tcg_ppc64_revert

Even though there's a lot more noise involved, similar results can be
seen on Travis-CI:

  - https://travis-ci.org/clebergnu/qemu/builds/487607293#L3035
     (04/16)
/home/travis/build/clebergnu/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries:
 PASS (22.21 s)

  - https://travis-ci.org/clebergnu/qemu/builds/487606849#L3035
     (04/16)
/home/travis/build/clebergnu/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries:
 PASS (11.27 s)

The reasons for the extra noise include:
  1) Those tests download the kernel images
  2) They are run only once
  3) They run on shared infrastructure

Based on Travis-CI runs alone, it's hard to determine if there was any
other significant regression (or improvement?) for other targets.  I'm
posting this early, but if I come up with more relevant information,
follow up here.

Regards,
- Cleber.

---

[1] - https://www.mail-archive.com/qemu-devel@nongnu.org/msg589497.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] 2x-3x tcg performance regression on ppc64 (maybe others)
  2019-02-01 21:53 [Qemu-devel] 2x-3x tcg performance regression on ppc64 (maybe others) Cleber Rosa
@ 2019-02-02  0:09 ` Richard Henderson
  2019-02-05  8:37 ` [Qemu-devel] [Qemu-ppc] " Howard Spoelstra
  1 sibling, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2019-02-02  0:09 UTC (permalink / raw)
  To: Cleber Rosa, qemu-devel@nongnu.org Developers, Peter Maydell
  Cc: Alex Bennée, qemu-ppc

On 2/1/19 1:53 PM, Cleber Rosa wrote:
> Getting straight to the point, this is the ppc64 "boot linux console"
> test, running 10 times, on top my latest rebase (e8977901b, that
> includes f7b78602f).
> 
>  (01/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
> (63.07 s)
...
> Then, with f7b78602f reverted:
> 
>  (01/10) boot_linux_console.py:BootLinuxConsole.test_ppc64_pseries: PASS
> (20.54 s)

This suggests that the cluster id added in that patch is perhaps uninitialized
or set incorrectly, so that we are spending much more time in code generation
than we should.


r~

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] 2x-3x tcg performance regression on ppc64 (maybe others)
  2019-02-01 21:53 [Qemu-devel] 2x-3x tcg performance regression on ppc64 (maybe others) Cleber Rosa
  2019-02-02  0:09 ` Richard Henderson
@ 2019-02-05  8:37 ` Howard Spoelstra
  2019-02-05 13:45   ` Peter Maydell
  1 sibling, 1 reply; 7+ messages in thread
From: Howard Spoelstra @ 2019-02-05  8:37 UTC (permalink / raw)
  To: Cleber Rosa
  Cc: qemu-devel@nongnu.org Developers, Peter Maydell, qemu-ppc,
	Alex Bennée, Richard Henderson

On Fri, Feb 1, 2019 at 11:54 PM Cleber Rosa <crosa@redhat.com> wrote:

> While working on the "boot_linux_console.py" tests, after a given
> rebase, I noticed that the ppc64 specific test began timing out.  The
> original timeout set to the test was 60 seconds, and it "always" had
> room to spare when running either on my system, or on Travis CI.
>
> Hi,

As reported on IRC on 31-01, performance loss also occurs with
qemu-system-ppc:
"I noticed a substantial performance loss (~35% in processor benchmark) in
qemu-system-ppc running Mac OS 9.2. Bisecting shows that
https://github.com/qemu/qemu/commit/f7b78602fdc6c6e4befc90159da8e93900b4bcb1
introduces the issue. Reverting brings performance back."

For a benchmark screen shot see:
https://surfdrive.surf.nl/files/index.php/s/M1o9rcNCt3QQ3I7

Best,
Howard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] 2x-3x tcg performance regression on ppc64 (maybe others)
  2019-02-05  8:37 ` [Qemu-devel] [Qemu-ppc] " Howard Spoelstra
@ 2019-02-05 13:45   ` Peter Maydell
  2019-02-05 14:47     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2019-02-05 13:45 UTC (permalink / raw)
  To: Howard Spoelstra
  Cc: Cleber Rosa, qemu-devel@nongnu.org Developers, qemu-ppc,
	Alex Bennée, Richard Henderson

On Tue, 5 Feb 2019 at 08:37, Howard Spoelstra <hsp.cat7@gmail.com> wrote:
>
>
>
>
> On Fri, Feb 1, 2019 at 11:54 PM Cleber Rosa <crosa@redhat.com> wrote:
>>
>> While working on the "boot_linux_console.py" tests, after a given
>> rebase, I noticed that the ppc64 specific test began timing out.  The
>> original timeout set to the test was 60 seconds, and it "always" had
>> room to spare when running either on my system, or on Travis CI.
>>
> Hi,
>
> As reported on IRC on 31-01, performance loss also occurs with qemu-system-ppc:
> "I noticed a substantial performance loss (~35% in processor benchmark) in qemu-system-ppc running Mac OS 9.2. Bisecting shows that https://github.com/qemu/qemu/commit/f7b78602fdc6c6e4befc90159da8e93900b4bcb1 introduces the issue. Reverting brings performance back."

Yeah; Mark Cave-Ayland did a bit more digging into this last night,
and I have a patch that fixes it. I'll send that in a moment when
I've confirmed that I've caught all the places we need to fold
the cluster index into the hash cflags value.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] 2x-3x tcg performance regression on ppc64 (maybe others)
  2019-02-05 13:45   ` Peter Maydell
@ 2019-02-05 14:47     ` Philippe Mathieu-Daudé
  2019-02-05 15:08       ` Peter Maydell
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-02-05 14:47 UTC (permalink / raw)
  To: Peter Maydell, Howard Spoelstra
  Cc: Alex Bennée, Richard Henderson, qemu-ppc,
	qemu-devel@nongnu.org Developers, Cleber Rosa

Hi Peter,

On 2/5/19 2:45 PM, Peter Maydell wrote:
> On Tue, 5 Feb 2019 at 08:37, Howard Spoelstra <hsp.cat7@gmail.com> wrote:
>> On Fri, Feb 1, 2019 at 11:54 PM Cleber Rosa <crosa@redhat.com> wrote:
>>>
>>> While working on the "boot_linux_console.py" tests, after a given
>>> rebase, I noticed that the ppc64 specific test began timing out.  The
>>> original timeout set to the test was 60 seconds, and it "always" had
>>> room to spare when running either on my system, or on Travis CI.
>>>
>> Hi,
>>
>> As reported on IRC on 31-01, performance loss also occurs with qemu-system-ppc:
>> "I noticed a substantial performance loss (~35% in processor benchmark) in qemu-system-ppc running Mac OS 9.2. Bisecting shows that https://github.com/qemu/qemu/commit/f7b78602fdc6c6e4befc90159da8e93900b4bcb1 introduces the issue. Reverting brings performance back."
> 
> Yeah; Mark Cave-Ayland did a bit more digging into this last night,
> and I have a patch that fixes it. I'll send that in a moment when
> I've confirmed that I've caught all the places we need to fold
> the cluster index into the hash cflags value.

I was going to send this patch and was looking for to fill the
"Reported-by" tag to finalize the commit description before sending:

-- >8 --
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index aa7b81aaf0..8dc2aab60e 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -418,7 +418,8 @@ static inline uint32_t tb_cflags(const
TranslationBlock *tb)
 static inline uint32_t curr_cflags(void)
 {
     return (parallel_cpus ? CF_PARALLEL : 0)
-         | (use_icount ? CF_USE_ICOUNT : 0);
+         | (use_icount ? CF_USE_ICOUNT : 0)
+         | CF_CLUSTER_MASK;
 }
---

I believe your patch description is way more correct/detailled than mine :P

Regards,

Phil.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] 2x-3x tcg performance regression on ppc64 (maybe others)
  2019-02-05 14:47     ` Philippe Mathieu-Daudé
@ 2019-02-05 15:08       ` Peter Maydell
  2019-02-05 15:11         ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2019-02-05 15:08 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Howard Spoelstra, Alex Bennée, Richard Henderson, qemu-ppc,
	qemu-devel@nongnu.org Developers, Cleber Rosa

On Tue, 5 Feb 2019 at 14:47, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> I was going to send this patch and was looking for to fill the
> "Reported-by" tag to finalize the commit description before sending:
>
> -- >8 --
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index aa7b81aaf0..8dc2aab60e 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -418,7 +418,8 @@ static inline uint32_t tb_cflags(const
> TranslationBlock *tb)
>  static inline uint32_t curr_cflags(void)
>  {
>      return (parallel_cpus ? CF_PARALLEL : 0)
> -         | (use_icount ? CF_USE_ICOUNT : 0);
> +         | (use_icount ? CF_USE_ICOUNT : 0)
> +         | CF_CLUSTER_MASK;
>  }

That's not right, because it puts all-1s in the cluster field.
That will happen to hit for TBs generated for CPUs in the
"not in a cluster" default UNASSIGNED_CLUSTER_INDEX, but it's
wrong for CPUs that are in a specifically assigned cluster
and will result in those failing to hit cached TBs when they should.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] 2x-3x tcg performance regression on ppc64 (maybe others)
  2019-02-05 15:08       ` Peter Maydell
@ 2019-02-05 15:11         ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-02-05 15:11 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Howard Spoelstra, Alex Bennée, Richard Henderson, qemu-ppc,
	qemu-devel@nongnu.org Developers, Cleber Rosa

On 2/5/19 4:08 PM, Peter Maydell wrote:
> On Tue, 5 Feb 2019 at 14:47, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>> I was going to send this patch and was looking for to fill the
>> "Reported-by" tag to finalize the commit description before sending:
>>
>> -- >8 --
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index aa7b81aaf0..8dc2aab60e 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -418,7 +418,8 @@ static inline uint32_t tb_cflags(const
>> TranslationBlock *tb)
>>  static inline uint32_t curr_cflags(void)
>>  {
>>      return (parallel_cpus ? CF_PARALLEL : 0)
>> -         | (use_icount ? CF_USE_ICOUNT : 0);
>> +         | (use_icount ? CF_USE_ICOUNT : 0)
>> +         | CF_CLUSTER_MASK;
>>  }
> 
> That's not right, because it puts all-1s in the cluster field.
> That will happen to hit for TBs generated for CPUs in the
> "not in a cluster" default UNASSIGNED_CLUSTER_INDEX, but it's
> wrong for CPUs that are in a specifically assigned cluster
> and will result in those failing to hit cached TBs when they should.

Oh OK now I understand.

Thanks for the explanation!

Phil.

> 
> thanks
> -- PMM
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-02-05 15:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-01 21:53 [Qemu-devel] 2x-3x tcg performance regression on ppc64 (maybe others) Cleber Rosa
2019-02-02  0:09 ` Richard Henderson
2019-02-05  8:37 ` [Qemu-devel] [Qemu-ppc] " Howard Spoelstra
2019-02-05 13:45   ` Peter Maydell
2019-02-05 14:47     ` Philippe Mathieu-Daudé
2019-02-05 15:08       ` Peter Maydell
2019-02-05 15:11         ` Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).