From: Rob Landley <rob@landley.net>
To: cotulla@yandex.ua
Cc: linux-hexagon@vger.kernel.org
Subject: Re: [DISCUSSION] Hexagon code inside kernel
Date: Fri, 22 Feb 2013 22:24:30 -0600 [thread overview]
Message-ID: <1361593470.29465.17@driftwood> (raw)
In-Reply-To: <1163031361018389@web26d.yandex.ru> (from cotulla@yandex.ua on Sat Feb 16 06:39:49 2013)
On 02/16/2013 06:39:49 AM, cotulla@yandex.ua wrote:
> Hi,
>
> > For the qdsp6v3 the effective clock rate was 300MHz per core, so
> yes.
> > It might be even slower for v2, not sure. (the chip clock rate is
> 1.8
> > GHz, there are 6 interleaved cores, so 1.8/6 = 300 The power
> savings
> > are not from the clock rate, but from the tiny transistor count.
> The
> > performance efficiency is from keeping all of those transistors
> > constantly wiggling, which is what the interleaved pipeline does.)
>
> Hm, I thought the maximum clock rate is 595.2 Mhz?
> Or 1.8 is another clock?
> But by changing this clock rate I can get different Q6 performance.
The clever thing hexagon did was avoid any pipeline interlocks. Instead
they had as many register profiles as pipeline stages, and they
round-robined them down the pipeline. So the v2 processor ran at 600
mhz but presented to Linux as a 6-way SMP chip each running at 100 mhz.
This meant there were 6 clock cycles between each memory access, so the
DRAM had no trouble keeping up. There was no speculative execution, no
branch prediction, it never did wasted work and any pipeline stage that
had nothing to do powered down completely for that clock cycle. They
got performance out of it via massive parallelism: each instruction was
a 4-issue VLIW, and the latter two cores were 4-way SIMD vector
thingies, so if you could break your task into 6 chunks (4 graphics
processes, an audio process, and a control process) it could do some
quite heavy lifting.
In the later chips, they were looking to reduce the number of pipeline
stages, which would let them clock the chip down (increasing the power
efficiency, power consumption increases exponentially with clock speed)
while still allowing each thread to progress at 100 mhz. So a 300 mhz
chip is probably a 3 stage pipeline presenting as 3 way SMP.
I only did a 6 month contract there in 2010 beating bugs out of the
toolchain. I know they hired Linutronix to help clean up their code so
it had a chance of being accepted upstream, but tglx and crowd had to
sign an NDA so I dunno what they're allowed to say about it, even now
that some of the code's gone upstream.
> > Don't know v2. But v3 had a 'real' MMU
> Hm, are you sure in that?
> I had never seen any usage of it. As well as binutils registers
> definition
> doesn't include any suitable registers for that.
The version I saw (v2) had a software loaded TLB which a binary blob
made act like an MMU. It had too few TLB slots and kept thrashing them
when running a real OS, so they were going to add more in a future
version.
The thing to realize about Qualcom is that the lawyers are in charge.
The patent licensing revenue is credited to the legal department but
the R&D costs of coming up with that IP in the first place is deducted
from engineering, so in terms of _net_ revenue it looks like licensing
is more profitable than engineering even though it's just a fancy
story. Political power within the company is based on how much net
revenue you're bringing in, and with Legal mooching off engineering
like that they get to overrule them most of the time.
So they've got brilliant engineers who do brilliant thigns you never
hear about, and would LIKE to get them out into the real world but can
never get permission. (Hence craziness like the "Code Aurora Forum"
which is a partnership between Qualcomm and Qualcomm with some random
co-signer (Intel) there to make it SEEM like somebody else is involved,
because spinning off a wholly-owned subsidiary "Qualcomm Innovation
Center" and having that sock puppet do all your open source stuff isn't
considered enough of a firewall between Legal's precious patents and
the GPL.
(Now add a bit of political infighting between the people who do their
"Scorpion" licensed ARM core and the people who would like to see
Hexagon used as a real processor instead of a multimedia coprocessor,
and what little power engineering has is wasted.)
So it's realy cool technology, fairly widely deployed, and if you want
to make use of it I'd recommend reverse engineering it. (You can look
around the code aurora forum pages and download the toolchains they
give to the android guys; those binary blobs get built with modified
gcc+binutils and the lawyers scrupulously obey the letter of the law as
they understand it; the code is published at an obscure URL somewhere.)
The fun part is that "objdump" can decode the magic instructions, even
in the binary blob. Because it has to be able to compile them, you see.
(They're working on Hexagon support for Open64 and LLVM, but gcc's
still a more mature compiler. Google for "hexagon open64" and similar
finds interesting stuff, by the way.)
> > Good, because the bootloader was going to be the other issue.
> Yes, in my case it's working :)
> But another guys who also want participate in this project with
> MSM8960/APQ8064 they still can't run any unsigned code on Q6.
> In modern phones it's often locked from changes :(
Getting hexagon support into QEMU would make life SO much easier...
> > I'd done the patches for glibc (yes, they're publicly available on
> > some website, don't know if they got merged or not), got 98% of the
> > many hundreds of glibc unit tests to pass, including most or all of
> > the thread tests including TLS. Someone had bootstrapped hundreds
> of
> > .debs and both python and perl passed 100% of their tests. I'm
> sure
> > no one cares, but even guile worked, and I was about to start
> fiddling
> > with haskell :-)
> Good to hear that. Good job!
> So userspace support is rather good in common.
I built Linux From Scratch and large chunks of beyond linux from
scratch during my contract in 2010 (put together a demo with X11,
albeit just clients connecting an X server running on another machine
through the net), but that was with their gcc 3.4, binutils 2.14, and
uClibc 0.9.30 forks. (All of which were obsolete already when I was
there, and have probably been abandoned since.)
That was using... comet boards, I think? (Those hacked up phone
motherboards Linas was talking about. The "snapdragon" SoC, QDSP6v2
chips plus a Scorpion plus an armv5 plus a QDSP4, all in a big ball
with USB and a serial port and an ethernet device and 256 megs of
memory and I forget what else. We had a small number of them because
they never made that many. Not a mass produced product, semi-obsolete
at the time, but the linux porting effort scrounged what resources it
could...)
Rob
next prev parent reply other threads:[~2013-02-23 4:24 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-15 14:28 [DISCUSSION] Hexagon code inside kernel cotulla
[not found] ` <CAHrUA364XES66kXhr0Gg1dh_MQBAS0+R8Q4x+EY3dgz6s=QRww@mail.gmail.com>
2013-02-15 22:33 ` Linas Vepstas
2013-02-16 1:35 ` cotulla
2013-02-16 2:34 ` Linas Vepstas
2013-02-16 12:39 ` cotulla
2013-02-16 17:33 ` Linas Vepstas
2013-02-16 19:21 ` cotulla
2013-02-19 4:36 ` rkuo
2013-02-19 14:29 ` Linas Vepstas
2013-02-20 1:07 ` cotulla
2013-02-20 1:17 ` cotulla
2013-02-23 4:24 ` Rob Landley [this message]
2013-02-24 12:00 ` cotulla
2013-02-24 16:32 ` Linas Vepstas
2013-02-24 17:29 ` cotulla
2013-02-24 21:03 ` Linas Vepstas
2013-02-25 17:26 ` Rob Landley
2013-02-26 18:54 ` cotulla
2013-02-27 0:58 ` Rob Landley
2013-02-27 12:39 ` cotulla
2013-02-24 12:23 ` cotulla
2013-02-26 6:55 ` Rob Landley
2013-02-26 19:30 ` cotulla
2013-02-26 19:32 ` cotulla
2013-02-26 19:59 ` Linas Vepstas
2013-02-26 20:25 ` cotulla
2013-02-26 20:57 ` Linas Vepstas
2013-02-27 1:06 ` Rob Landley
2013-02-27 1:30 ` Linas Vepstas
2013-02-27 3:03 ` Rob Landley
2013-02-27 12:35 ` cotulla
-- strict thread matches above, loose matches on Subject: below --
2013-02-24 0:24 Linas Vepstas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1361593470.29465.17@driftwood \
--to=rob@landley.net \
--cc=cotulla@yandex.ua \
--cc=linux-hexagon@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.