From: Tom Evans <tom_usenet@optusnet.com.au>
To: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: xenomai@xenomai.org
Subject: Re: [Xenomai] First call to rt_timer_tsc() causes an unexpected switch to secondary mode.
Date: Sat, 18 Oct 2014 01:08:57 +1100 [thread overview]
Message-ID: <544122F9.9060203@optusnet.com.au> (raw)
In-Reply-To: <20141017070201.GI30661@sisyphus.hd.free.fr>
On 17/10/2014 6:02 PM, Gilles Chanteperdrix wrote:
> On Fri, Oct 17, 2014 at 05:47:07PM +1100, Tom Evans wrote:
>> On 17/10/14 16:34, Gilles Chanteperdrix wrote:
>>>> Work out how many pixels per second you're processing and then
>>>> compare it to the memory bandwidth.
That would still be an interesting number to measure and quote.
>> It might be better to FLUSH the entire cache, perform a L2-sized
>> transfer and then flush it again. The flushes *might* be to linear
>> addresses in open pages.
Thinking more about this it would be better to flush the entire cache
and then perform a preload-and-read pass (to load the cache from
complete open rows in on-page RAM) then loop reading and writing (from
cache to cache) and then loop flushing the destination cache lines into
open rows of the RAM.
This would be easy on the PPC. It has six "User level cache
instructions". Even the Coldfire has a "CPUSHL" user-mode instruction.
This seems to be impossible from user-space on the ARM, for as far as I
can tell all of the 13 "Cache and branch predictor maintenance
operations, VMSA" instructions "can be executed only by software
executing at PL1 or higher". The only user-space ones are PLD, PLDW and
PLI. So I'd have to write a kernel driver to copy user memory and worry
about the page translation.
>> I got my fastest memcpy() speed on an MCF5329 by reading 2k to the
>> stack (in static ram in the CPU) and then writing that back out.
>> Copying twice was a LOT faster than any other method.
240MHz Coldfire with 80MHz 32-bit SDR memory. It started out at 33MB/s,
got to 39MB/s by using the multiple register move instruction, and
peaked at 55 MB/s copying via the stack in internal SRAM. Memcpy() from
internal RAM to internal RAM managed 304MB/s!
RAM could be read at 87MB/s (due to the lack of pipelining in this CPU)
but could be written at 207MB/s.
Function kB/s Memclk/cache line
=============================================
memcpy_gcc_4_4 30883 41.45
memcpy_gcc_4_3_O1 33382 38.34
memcpy_gcc_4_3_O2 33385 38.34
memcpy_gcc_2 33390 38.33
memcpy(132096) 33379 38.35
memcpy_moveml 39752 32.20
memcpy_dma 43709 29.28
memcpy_moveml_32 49618 25.80
memcpy_stack 52912 24.19
memcpy_moveml_192 54052 23.68
memcpy_moveml_48 54093 23.66
memcpy_stack_48 54997 23.27
memcpy_stack_32_mis 55079 23.24
memcpy_stack_32 55125 23.22
memcpy_stack_192 55736 22.97
memcpy_moveml_96_ps 56739 22.56
memRead_stack_32 85017 15.06
memRead_moveml_32 87141 14.69
memWrite_stack_32 196864 6.50
memWrite_moveml_32 207535 6.17
memcpy_stack_stack 304368 4.21 (12.62 CPU clocks)
> Actually, I am wrong, I was only reading the image,
> not writing to it, simply computing a very reduced
> averaged image, so there were some writes from time
> to time, but very rarely.
That should get the best performance. That's not a common operation in
what I'm doing which is alpha-blending graphics over each other.
> > Miss L1 and wait 10 clocks. Miss L2 and wait 153 clocks! Step
> > through memory 4k at a time and wait 46 clocks for the TLB to
> > reload.
>
> That does not prove that the memory system is slow, that
> proves that the processor access to memory is slow. But
> why is that?
The memory controller may not be capable of keeping multiple banks open,
or even rows open. It takes a long time to close an open row with the
precharge and to then open the next one.
Don't even think about reading or writing peripheral pins. I worked on
an ARM chip (PXA) that took 200 CPU Clocks to read or write a port
register. It was actually recommended to program the DMA controller to
read and write the ports and to interrupt the CPU when done!
That previously-quoted ARM FAQ on memory copying suggests the same thing
(DMA or Preload Engine) for copying memory while the CPU goes and does
something else.
> But I see your point, the problem is not NEON, but the
> way the processor handles memory and cache.
The frustrating thing is the missing user-mode cache control instructions.
Tom
next prev parent reply other threads:[~2014-10-17 14:08 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <543E4B9F.60602@cgglobal.com>
2014-10-15 10:59 ` [Xenomai] First call to rt_timer_tsc() causes an unexpected switch to secondary mode ZIV-Alberto Ozalla Cantabrana
2014-10-15 11:11 ` Gilles Chanteperdrix
2014-10-15 13:12 ` ZIV-Alberto Ozalla Cantabrana
2014-10-15 13:16 ` Gilles Chanteperdrix
2014-10-15 20:03 ` Gilles Chanteperdrix
2014-10-17 16:33 ` ZIV-Alberto Ozalla Cantabrana
2014-10-17 16:38 ` Gilles Chanteperdrix
2014-10-15 13:19 ` Gilles Chanteperdrix
2014-10-15 13:34 ` ZIV-Alberto Ozalla Cantabrana
2014-10-16 7:16 ` Gilles Chanteperdrix
2014-10-16 8:16 ` Gilles Chanteperdrix
2014-10-16 8:33 ` ZIV-Alberto Ozalla Cantabrana
2014-10-16 8:39 ` Gilles Chanteperdrix
2014-10-16 18:17 ` Lennart Sorensen
2014-10-16 18:58 ` Gilles Chanteperdrix
2014-10-16 20:56 ` Lennart Sorensen
2014-10-16 23:14 ` Tom Evans
2014-10-17 5:34 ` Gilles Chanteperdrix
2014-10-17 6:47 ` Tom Evans
2014-10-17 7:02 ` Gilles Chanteperdrix
2014-10-17 14:08 ` Tom Evans [this message]
2014-10-17 19:36 ` Gilles Chanteperdrix
2014-10-17 14:32 ` Anders Blomdell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=544122F9.9060203@optusnet.com.au \
--to=tom_usenet@optusnet.com.au \
--cc=gilles.chanteperdrix@xenomai.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.