From: Aurelien Jarno <aurelien@aurel32.net>
To: Artyom Tarasenko <atar4qemu@gmail.com>
Cc: qemu-devel <qemu-devel@nongnu.org>, Dennis Luehring <dl.soluz@gmx.net>
Subject: Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
Date: Fri, 31 Jul 2015 17:43:23 +0200 [thread overview]
Message-ID: <20150731154323.GD23508@aurel32.net> (raw)
In-Reply-To: <CACXAS8ABo2yCVL9Hk9yoyY1wvF4X1Oq4vFs87gQyRjH59Yi1Lg@mail.gmail.com>
On 2015-07-31 17:31, Artyom Tarasenko wrote:
> On Thu, Jul 30, 2015 at 5:50 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On 2015-07-30 10:55, Aurelien Jarno wrote:
> >> On 2015-07-30 10:16, Dennis Luehring wrote:
> >> > Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
> >> > >On 2015-07-30 05:52, Dennis Luehring wrote:
> >> > >> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> >> > >> >The point is that emulation has a cost, and it's quite difficult to
> >> > >> >to lower it and thus improve the emulation speed.
> >> > >>
> >> > >> so its just not strange for you to see an 1/100...200 of the native x64
> >> > >> speed under qemu/SPARC64
> >> > >> i hoped that someone will jump up an shout "its impossible - it needs to be
> >> > >> a bug" ...sadly not
> >> > >
> >> > >Overall the ratio is more around 10, but in some specific cases where
> >> > >the TB cache is inefficient and TB can't be linked or with an
> >> > >inefficient MMU, a ratio of 100 is possible.
> >> >
> >> >
> >> > sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
> >> > Host x64 : 1.3580s
> >> > Qemu SPARC64: 184.2532s
> >> >
> >> > sysbench shows nearly ration of 200
> >>
> >> Note that when you say SPARC64 here, it's actually only the kernel, you
> >> are using a 32-bit userland. And that makes a difference. Here are my
> >> tests here:
> >>
> >> host (x86-64) 0.8976s
> >> sparc32 guest (sparc64 kernel) 99.6116s
> >> sparc64 guest (sparc64 kernel) 4.4908s
> >>
> >> So it looks like the 32-bit code is not QEMU friendly. I haven't looked
> >> at it yet, but I guess it might be due to dynamic jumps, so that TB
> >> can't be chained.
> >
> > This is the corresponding C code from sysbench, which is ran 10000
> > times.
> >
> > | int cpu_execute_request(sb_request_t *r, int thread_id)
> > | {
> > | unsigned long long c;
> > | unsigned long long l,t;
> > | unsigned long long n=0;
> > | log_msg_t msg;
> > | log_msg_oper_t op_msg;
> > |
> > | (void)r; /* unused */
> > |
> > | /* Prepare log message */
> > | msg.type = LOG_MSG_TYPE_OPER;
> > | msg.data = &op_msg;
> > |
> > | /* So far we're using very simple test prime number tests in 64bit */
> > | LOG_EVENT_START(msg, thread_id);
> > |
> > | for(c=3; c < max_prime; c++)
> > | {
> > | t = sqrt(c);
> > | for(l = 2; l <= t; l++)
> > | if (c % l == 0)
> > | break;
> > | if (l > t )
> > | n++;
> > | }
> > |
> > | LOG_EVENT_STOP(msg, thread_id);
> > |
> > | return 0;
> > | }
> >
> > This is a very simple test, which is probably not a good representation
> > of the CPU performances, even more when emulated by QEMU. In addition to
> > that, given it mostly uses 64 bit integer, it's kind of expected that
> > the 32-bit version is slower.
> >
> > Anyway I have extracted this code into a C file (see attached file) that
> > can more easily compiled to 32 or 64 bit using -m32 or -m64. I observe
> > the same behavior than sysbench, even with qemu-user (which is not
> > surprising as the above code doesn't really put pressure the MMU.
> >
> > Running it in I get the following time:
> > x86-64 host 0.877s
> > sparc guest -m32 1m39s
> > sparc guest -m64 3.5s
> > opensparc T1 -m32 1m59s
> > opensparc T1 -m64 1m12s
> >
> > So overall QEMU is faster than a not so old real hardware. That said
> > looking at it quickly it seems that some of the FP instructions are
> > actually trapped and emulated by the kernel on the opensparc T1.
> >
> > Now coming back to the QEMU problem, the issue is that the 64-bit code
> > is using the udivx instruction to compute the modulo, while the 32-bit
> > code calls the __umoddi3 GCC helper.
>
> Actually this looks like a bug/missing feature in gcc. Why doesn't it use udivx
> instruction in "SPARC32PLUS, V8+ Required" code?
No idea.
> > It uses a lot of integer functions
> > based on CPU flags, so most of the time is spent computing them in
> > helper_compute_psr.
>
> I wonder if this can be optimized. I guess most RISC CPUs would have a
> similar problem. Unlike x86, the compilers usually optimize
> instructions on flag usage. If there is an instruction modifying flags
> in a code, the flags will be used for sure, so it probably makes a
> little sense to pospone the flag computation?
Indeed. ARM and SH4 use one TCG temp per flag, and they can be computed
one by one using setcond. The optimizer and the liveness analysis then
get rid of the unused computation. However while it allows intra-TB
optimization, it prevent any other flags optimization. Therefore the
only way to know if it is a good idea or not is to implement it and
benchmark that, but using a bit more than a single biased benchmark like
the one from sysbench.
Also note that the current implementation predates the introduction of
setcond, which is necessary to be able to compute the flags using TCG
code.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
next prev parent reply other threads:[~2015-07-31 15:43 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-28 7:52 [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation? Dennis Luehring
2015-07-28 9:54 ` Artyom Tarasenko
2015-07-29 6:20 ` Dennis Luehring
2015-07-29 8:23 ` Artyom Tarasenko
2015-07-29 15:01 ` Aurelien Jarno
2015-07-30 3:52 ` Dennis Luehring
2015-07-30 7:52 ` Aurelien Jarno
2015-07-30 8:16 ` Dennis Luehring
2015-07-30 8:42 ` Artyom Tarasenko
2015-07-30 8:55 ` Aurelien Jarno
2015-07-30 9:35 ` Artyom Tarasenko
2015-07-30 10:09 ` Aurelien Jarno
2015-07-30 18:21 ` Dennis Luehring
2015-07-30 15:50 ` Aurelien Jarno
2015-07-31 15:31 ` Artyom Tarasenko
2015-07-31 15:43 ` Aurelien Jarno [this message]
2015-08-02 13:11 ` Mark Cave-Ayland
2015-08-03 8:31 ` Artyom Tarasenko
2015-08-03 9:17 ` Aurelien Jarno
2015-08-18 9:24 ` Artyom Tarasenko
2015-08-18 17:55 ` Richard Henderson
2015-08-19 10:41 ` Artyom Tarasenko
2015-08-19 11:00 ` Aurelien Jarno
2015-08-19 14:41 ` Artyom Tarasenko
2015-08-20 5:22 ` Dennis Luehring
2015-08-20 10:40 ` Artyom Tarasenko
2015-08-20 17:19 ` Richard Henderson
2015-08-21 4:32 ` Dennis Luehring
2015-08-21 5:49 ` Richard Henderson
2015-08-21 6:05 ` Dennis Luehring
2015-08-21 15:47 ` Richard Henderson
2015-08-21 16:13 ` Aurelien Jarno
2015-08-21 16:41 ` Dennis Luehring
2015-08-22 16:45 ` Artyom Tarasenko
2015-08-22 17:47 ` Dennis Luehring
2015-08-22 18:53 ` Artyom Tarasenko
2015-08-23 12:11 ` Dennis Luehring
2015-08-23 0:41 ` Richard Henderson
2015-08-26 16:17 ` Artyom Tarasenko
2015-08-26 19:47 ` Richard Henderson
2015-08-27 5:54 ` Dennis Luehring
2015-08-27 15:04 ` Richard Henderson
2015-08-27 15:58 ` Artyom Tarasenko
2015-08-17 11:32 ` Dennis Luehring
2015-08-03 7:58 ` Dennis Luehring
2015-08-03 14:51 ` Dennis Luehring
2015-08-03 15:59 ` Karel Gardas
2015-08-03 19:51 ` Dennis Luehring
2015-08-06 9:00 ` Karel Gardas
2015-08-06 9:21 ` Dennis Luehring
2015-08-06 9:27 ` Dennis Luehring
2015-08-06 12:50 ` Karel Gardas
2015-08-06 16:35 ` Dennis Luehring
2015-08-18 4:25 ` Dennis Luehring
2015-08-18 8:19 ` Aurelien Jarno
2015-08-18 10:39 ` Dennis Luehring
2015-08-18 11:21 ` Dennis Luehring
[not found] ` <CAMO55fkcW1eOaZSz2MJgqZEP29pTuHvTLe0Kna5eHYfg7cFyPA@mail.gmail.com>
2015-08-19 4:28 ` Dennis Luehring
2015-07-29 8:07 ` Dennis Luehring
2015-07-29 15:03 ` Aurelien Jarno
2015-07-29 9:17 ` Karel Gardas
2015-07-29 10:20 ` Dennis Luehring
2015-07-29 13:45 ` Karel Gardas
2015-07-29 15:13 ` Aurelien Jarno
2015-07-29 10:55 ` Dennis Luehring
2015-07-29 12:34 ` Karel Gardas
2015-07-29 12:38 ` Karel Gardas
2015-07-29 13:55 ` Dennis Luehring
2015-07-29 14:41 ` Karel Gardas
2015-07-30 3:47 ` Dennis Luehring
2015-07-30 7:12 ` Paolo Bonzini
2015-07-30 8:31 ` Artyom Tarasenko
2015-08-02 19:12 ` Alex Bennée
2015-07-30 7:55 ` Aurelien Jarno
2015-08-17 14:19 ` Artyom Tarasenko
2015-08-17 15:40 ` Richard Henderson
2015-08-17 16:25 ` Artyom Tarasenko
2015-08-17 21:08 ` Aurelien Jarno
2015-08-27 15:29 ` Artyom Tarasenko
2015-09-02 4:34 ` Dennis Luehring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150731154323.GD23508@aurel32.net \
--to=aurelien@aurel32.net \
--cc=atar4qemu@gmail.com \
--cc=dl.soluz@gmx.net \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).