Re: [DISCUSSION] Hexagon code inside kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rob Landley <rob@landley.net>
To: cotulla@yandex.ua
Cc: linux-hexagon@vger.kernel.org
Subject: Re: [DISCUSSION] Hexagon code inside kernel
Date: Fri, 22 Feb 2013 22:24:30 -0600	[thread overview]
Message-ID: <1361593470.29465.17@driftwood> (raw)
In-Reply-To: <1163031361018389@web26d.yandex.ru> (from cotulla@yandex.ua on Sat Feb 16 06:39:49 2013)

On 02/16/2013 06:39:49 AM, cotulla@yandex.ua wrote:
> Hi,
> 
> >  For the qdsp6v3 the effective clock rate was 300MHz per core, so  
> yes.
> >  It might be even slower for v2, not sure.  (the chip clock rate is  
> 1.8
> >  GHz, there are 6 interleaved cores, so 1.8/6 = 300  The power  
> savings
> >  are not from the clock rate, but from the tiny transistor count.  
> The
> >  performance efficiency is from keeping all of those transistors
> >  constantly wiggling, which is what the interleaved pipeline does.)
> 
> Hm, I thought the maximum clock rate is 595.2 Mhz?
> Or 1.8 is another clock?
> But by changing this clock rate I can get different Q6 performance.

The clever thing hexagon did was avoid any pipeline interlocks. Instead  
they had as many register profiles as pipeline stages, and they  
round-robined them down the pipeline. So the v2 processor ran at 600  
mhz but presented to Linux as a 6-way SMP chip each running at 100 mhz.

This meant there were 6 clock cycles between each memory access, so the  
DRAM had no trouble keeping up. There was no speculative execution, no  
branch prediction, it never did wasted work and any pipeline stage that  
had nothing to do powered down completely for that clock cycle. They  
got performance out of it via massive parallelism: each instruction was  
a 4-issue VLIW, and the latter two cores were 4-way SIMD vector  
thingies, so if you could break your task into 6 chunks (4 graphics  
processes, an audio process, and a control process) it could do some  
quite heavy lifting.

In the later chips, they were looking to reduce the number of pipeline  
stages, which would let them clock the chip down (increasing the power  
efficiency, power consumption increases exponentially with clock speed)  
while still allowing each thread to progress at 100 mhz. So a 300 mhz  
chip is probably a 3 stage pipeline presenting as 3 way SMP.

I only did a 6 month contract there in 2010 beating bugs out of the  
toolchain. I know they hired Linutronix to help clean up their code so  
it had a chance of being accepted upstream, but tglx and crowd had to  
sign an NDA so I dunno what they're allowed to say about it, even now  
that some of the code's gone upstream.

> >  Don't know v2. But v3 had a 'real' MMU
> Hm, are you sure in that?
> I had never seen any usage of it. As well as binutils registers  
> definition
> doesn't include any suitable registers for that.

The version I saw (v2) had a software loaded TLB which a binary blob  
made act like an MMU. It had too few TLB slots and kept thrashing them  
when running a real OS, so they were going to add more in a future  
version.

The thing to realize about Qualcom is that the lawyers are in charge.  
The patent licensing revenue is credited to the legal department but  
the R&D costs of coming up with that IP in the first place is deducted  
from engineering, so in terms of _net_ revenue it looks like licensing  
is more profitable than engineering even though it's just a fancy  
story. Political power within the company is based on how much net  
revenue you're bringing in, and with Legal mooching off engineering  
like that they get to overrule them most of the time.

So they've got brilliant engineers who do brilliant thigns you never  
hear about, and would LIKE to get them out into the real world but can  
never get permission. (Hence craziness like the "Code Aurora Forum"  
which is a partnership between Qualcomm and Qualcomm with some random  
co-signer (Intel) there to make it SEEM like somebody else is involved,  
because spinning off a wholly-owned subsidiary "Qualcomm Innovation  
Center" and having that sock puppet do all your open source stuff isn't  
considered enough of a firewall between Legal's precious patents and  
the GPL.

(Now add a bit of political infighting between the people who do their  
"Scorpion" licensed ARM core and the people who would like to see  
Hexagon used as a real processor instead of a multimedia coprocessor,  
and what little power engineering has is wasted.)

So it's realy cool technology, fairly widely deployed, and if you want  
to make use of it I'd recommend reverse engineering it. (You can look  
around the code aurora forum pages and download the toolchains they  
give to the android guys; those binary blobs get built with modified  
gcc+binutils and the lawyers scrupulously obey the letter of the law as  
they understand it; the code is published at an obscure URL somewhere.)

The fun part is that "objdump" can decode the magic instructions, even  
in the binary blob. Because it has to be able to compile them, you see.  
(They're working on Hexagon support for Open64 and LLVM, but gcc's  
still a more mature compiler. Google for "hexagon open64" and similar  
finds interesting stuff, by the way.)

> >  Good, because the bootloader was going to be the other issue.
> Yes, in my case it's working :)
> But another guys who also want participate in this project with  
> MSM8960/APQ8064 they still can't run any unsigned code on Q6.
> In modern phones it's often locked from changes :(

Getting hexagon support into QEMU would make life SO much easier...

> >  I'd done the patches for glibc (yes, they're publicly available on
> >  some website, don't know if they got merged or not), got 98% of the
> >  many hundreds of glibc unit tests to pass, including most or all of
> >  the thread tests including TLS. Someone had bootstrapped hundreds  
> of
> >  .debs and both python and perl passed 100% of their tests.  I'm  
> sure
> >  no one cares, but even guile worked, and I was about to start  
> fiddling
> >  with haskell :-)
> Good to hear that. Good job!
> So userspace support is rather good in common.

I built Linux From Scratch and large chunks of beyond linux from  
scratch during my contract in 2010 (put together a demo with X11,  
albeit just clients connecting an X server running on another machine  
through the net), but that was with their gcc 3.4, binutils 2.14, and  
uClibc 0.9.30 forks. (All of which were obsolete already when I was  
there, and have probably been abandoned since.)

That was using... comet boards, I think? (Those hacked up phone  
motherboards Linas was talking about. The "snapdragon" SoC, QDSP6v2  
chips plus a Scorpion plus an armv5 plus a QDSP4, all in a big ball  
with USB and a serial port and an ethernet device and 256 megs of  
memory and I forget what else. We had a small number of them because  
they never made that many. Not a mass produced product, semi-obsolete  
at the time, but the linux porting effort scrounged what resources it  
could...)

Rob

next prev parent reply	other threads:[~2013-02-23  4:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-15 14:28 [DISCUSSION] Hexagon code inside kernel cotulla
     [not found] ` <CAHrUA364XES66kXhr0Gg1dh_MQBAS0+R8Q4x+EY3dgz6s=QRww@mail.gmail.com>
2013-02-15 22:33   ` Linas Vepstas
2013-02-16  1:35     ` cotulla
2013-02-16  2:34       ` Linas Vepstas
2013-02-16 12:39         ` cotulla
2013-02-16 17:33           ` Linas Vepstas
2013-02-16 19:21             ` cotulla
2013-02-19  4:36           ` rkuo
2013-02-19 14:29             ` Linas Vepstas
2013-02-20  1:07               ` cotulla
2013-02-20  1:17             ` cotulla
2013-02-23  4:24           ` Rob Landley [this message]
2013-02-24 12:00             ` cotulla
2013-02-24 16:32               ` Linas Vepstas
2013-02-24 17:29                 ` cotulla
2013-02-24 21:03                   ` Linas Vepstas
2013-02-25 17:26                     ` Rob Landley
2013-02-26 18:54                       ` cotulla
2013-02-27  0:58                         ` Rob Landley
2013-02-27 12:39                           ` cotulla
2013-02-24 12:23             ` cotulla
2013-02-26  6:55               ` Rob Landley
2013-02-26 19:30                 ` cotulla
2013-02-26 19:32                 ` cotulla
2013-02-26 19:59                   ` Linas Vepstas
2013-02-26 20:25                     ` cotulla
2013-02-26 20:57                       ` Linas Vepstas
2013-02-27  1:06                   ` Rob Landley
2013-02-27  1:30                     ` Linas Vepstas
2013-02-27  3:03                       ` Rob Landley
2013-02-27 12:35                         ` cotulla
  -- strict thread matches above, loose matches on Subject: below --
2013-02-24  0:24 Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1361593470.29465.17@driftwood \
    --to=rob@landley.net \
    --cc=cotulla@yandex.ua \
    --cc=linux-hexagon@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.