Re: [DISCUSSION] Hexagon code inside kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rob Landley <rob@landley.net>
To: linasvepstas@gmail.com
Cc: cotulla@yandex.ua, linux-hexagon@vger.kernel.org
Subject: Re: [DISCUSSION] Hexagon code inside kernel
Date: Mon, 25 Feb 2013 11:26:46 -0600	[thread overview]
Message-ID: <1361813206.27287.1@driftwood> (raw)
In-Reply-To: <CAHrUA36YTCfedZGxBa+Vgzvz=DBR=PJhcPHe0aZegQiD79gHXQ@mail.gmail.com> (from linasvepstas@gmail.com on Sun Feb 24 15:03:37 2013)

On 02/24/2013 03:03:37 PM, Linas Vepstas wrote:
> > Yes, there is an optimized memcpy version.
> > But my goal was to "compare" ARM performance with QDSP6 to know  
> what to wait from it.
> > So I made a simple C code and tested it on both processors.

You're comparing arm performance with QDSP6 by writing pessimal QDSP6  
code that does single-byte moves and keeps half the execution units  
idle. You're going to get some extremely useful numbers out of that,  
aren't you? (Even their uClibc port had an assembly optimized  
memmove().)

Is your arm code also doing single byte moves, with the requisite  
bit-shifting and masking that doing that on arm entails (since last I  
checked arm hasn't actually _got_ instructions that handle bytes,  
although maybe it went into thumb2 or v7 or v8 when I wasn't  
looking...)?

> > Did you test performance inside Linux on Hexagon?
> 
> Yes ... the main result was that it was TLB-starved.  They guys
> designing it are performance and watts-per-cycle crazy, they're very
> devoted to optimizing this stuff, to getting the most per transistor
> possible. Its a very tiny core with very few transistors.  I mean, its
> probably smaller than the ARM register file (OK, I'm just making this
> last one up, but I'm guessing it just might be true, I wouldn't be
> surprised.).

Specifically, the v2 hardware (in the snapdragon chipset in the Nexus  
One) has 6 register profiles (for the 6 pipeline stages, acting as  
6-way SMP) but performance peaked at "make -j 3" which ran very  
slightly faster than "make -j 4", and then -j 5 and -j 6 were each  
noticeably slower (due to TLB thrashing).

I believe that v3 had already taped out by then (late 2010, but it had  
fewer pipeline stages and thus register profiles anyway), and then v4  
was going to increase the TLB entries. What actually shipped was after  
my time, dunno the details.

Rob

next prev parent reply	other threads:[~2013-02-25 17:26 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-15 14:28 [DISCUSSION] Hexagon code inside kernel cotulla
     [not found] ` <CAHrUA364XES66kXhr0Gg1dh_MQBAS0+R8Q4x+EY3dgz6s=QRww@mail.gmail.com>
2013-02-15 22:33   ` Linas Vepstas
2013-02-16  1:35     ` cotulla
2013-02-16  2:34       ` Linas Vepstas
2013-02-16 12:39         ` cotulla
2013-02-16 17:33           ` Linas Vepstas
2013-02-16 19:21             ` cotulla
2013-02-19  4:36           ` rkuo
2013-02-19 14:29             ` Linas Vepstas
2013-02-20  1:07               ` cotulla
2013-02-20  1:17             ` cotulla
2013-02-23  4:24           ` Rob Landley
2013-02-24 12:00             ` cotulla
2013-02-24 16:32               ` Linas Vepstas
2013-02-24 17:29                 ` cotulla
2013-02-24 21:03                   ` Linas Vepstas
2013-02-25 17:26                     ` Rob Landley [this message]
2013-02-26 18:54                       ` cotulla
2013-02-27  0:58                         ` Rob Landley
2013-02-27 12:39                           ` cotulla
2013-02-24 12:23             ` cotulla
2013-02-26  6:55               ` Rob Landley
2013-02-26 19:30                 ` cotulla
2013-02-26 19:32                 ` cotulla
2013-02-26 19:59                   ` Linas Vepstas
2013-02-26 20:25                     ` cotulla
2013-02-26 20:57                       ` Linas Vepstas
2013-02-27  1:06                   ` Rob Landley
2013-02-27  1:30                     ` Linas Vepstas
2013-02-27  3:03                       ` Rob Landley
2013-02-27 12:35                         ` cotulla
  -- strict thread matches above, loose matches on Subject: below --
2013-02-24  0:24 Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1361813206.27287.1@driftwood \
    --to=rob@landley.net \
    --cc=cotulla@yandex.ua \
    --cc=linasvepstas@gmail.com \
    --cc=linux-hexagon@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.