* Coldfire v4 low performance of read/write to shared memory
@ 2013-02-13 11:20 Lars Michael
2013-02-16 18:54 ` Thorsten Glaser
2013-03-07 10:01 ` Lars Michael
0 siblings, 2 replies; 3+ messages in thread
From: Lars Michael @ 2013-02-13 11:20 UTC (permalink / raw)
To: linux-m68k; +Cc: lh_post
Hello,
We are having some performance issues on our Coldfire v4 based board and hope we can get some help from the forum.
The CPU is the 54418 running a 2.6.38 kernel. 128MB NAND flash and 256MB SDRAM. The compiler is GCC 4.5.97.
At first we had a low throughput on the ethernet and it drops significantly when CPU load increases. We have tested the controller and kernel module and found that it is not the bottleneck. It is like the kernel network layers do not process packets very fast. So we did more tests, one being access to SDRAM. Here are the main results:
Write 25MB to shared memory:
[root@CPB529 /]# dd if=/dev/zero bs=1M count=25 of=/dev/shm/25MiB
26214400 bytes (26 MB) copied, 1.58318 s, 16.6 MB/s
Read 25MiB from shared mem:
[root@CPB529 /]# dd if=/dev/shm/25MiB of=/dev/null
26214400 bytes (26 MB) copied, 1.2091 s, 21.7 MB/s
Read/write 25MiB to shared mem:
[root@CPB529 /]# dd if=/dev/shm/25MiB of=/dev/shm/25MiBnew
26214400 bytes (26 MB) copied, 6.36459 s, 4.1 MB/s
In the last test, we would expect a much higher throughput. At least 8MB/s! Why is it so low? and could this indicate a problem in e.g. the kernel that can affect all processes running (including slowing down the kernel network layers)?
How can we diagnose this further?
At the same time we noticed (by using strace) that there is an excessive and continously sequence of system calls at normal program execusion:
333 (__NR_GET_THREAD_AREA), and 335 (__NR_ATOMIC_CMPXCHG_32)
They appear for both single and multithreaded programs.
Are all these calls required? are they adding unnecessary load on the CPU?
Example:
SYS_333(0, 0x601c2dc0, 0x602f8059, 0x8, 0x16078) = 1612489920
SYS_335(0x1, 0, 0xffffffff, 0x8, 0x80000000) = 0
SYS_335(0x602f4634, 0, 0, 0x8, 0x80000000) = 0
SYS_335(0, 0x1, 0, 0x8, 0x80000000) = 1
SYS_333(0x8de, 0x601c2a90, 0x6003b37c, 0x2, 0xd8630) = 1612489920
SYS_333(0, 0x601c2a90, 0x6003b41f, 0x2, 0xd8630) = 1612489920
SYS_333(0x487, 0, 0x6003b37c, 0x2, 0xd833c) = 1612489920
SYS_333(0, 0, 0x6003b9a0, 0x2, 0xd833c) = 1612489920
SYS_335(0x1, 0, 0, 0x2, 0xbf839e84) = 0
Anybody running Linux on v4 Coldfire that has experienced similiar issues?
Appreciate all comments and help on this. Let me know if more information is needed.
Thanks and regards,
Lars Horvath
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Coldfire v4 low performance of read/write to shared memory
2013-02-13 11:20 Coldfire v4 low performance of read/write to shared memory Lars Michael
@ 2013-02-16 18:54 ` Thorsten Glaser
2013-03-07 10:01 ` Lars Michael
1 sibling, 0 replies; 3+ messages in thread
From: Thorsten Glaser @ 2013-02-16 18:54 UTC (permalink / raw)
To: linux-m68k
Lars Michael <lh_post <at> yahoo.com> writes:
> At the same time we noticed (by using strace) that there is an excessive and
continously sequence of system
> calls at normal program execusion:
>
> 333 (__NR_GET_THREAD_AREA), and 335 (__NR_ATOMIC_CMPXCHG_32)
Yes, that is in fact normal. If you stop using programs (and libraries)
that use TLS (thread-local storage) and atomics, you’ll notice a massive
speedup as these syscalls will no longer be issued.
The only way out of this is an ABI breakage, with at least
* set one CPU register aside for the TLS base
* add a VDSO or some other kind of page for “fast syscalls”,
to optimise away the need for cmpxchg to call into the
kernel (if possible), and maybe speed up e.g. gettimeofday
* bump time_t to 64 bit (pet peeve of mine)
* … maybe others? I’m not a Linux kernel coder.
bye,
//mirabilos
PS: Does strace by now know about these two syscalls,
and possibly filter them out? Does qemu’s userspace
emulation provide for them?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Coldfire v4 low performance of read/write to shared memory
2013-02-13 11:20 Coldfire v4 low performance of read/write to shared memory Lars Michael
2013-02-16 18:54 ` Thorsten Glaser
@ 2013-03-07 10:01 ` Lars Michael
1 sibling, 0 replies; 3+ messages in thread
From: Lars Michael @ 2013-03-07 10:01 UTC (permalink / raw)
To: linux-m68k; +Cc: lh_post
Lars Michael <lh_post <at> yahoo.com> writes:
At the same time we noticed (by using strace) that there is an excessive and
continously sequence of system
calls at normal program execusion:
333 (__NR_GET_THREAD_AREA), and 335 (__NR_ATOMIC_CMPXCHG_32)
Thorsten Glaser <tg@xxxxxxxxxx> wrote:
Yes, that is in fact normal. If you stop using programs (and libraries)
that use TLS (thread-local storage) and atomics, you’ll notice a massive
speedup as these syscalls will no longer be issued.
The only way out of this is an ABI breakage, with at least
* set one CPU register aside for the TLS base
* add a VDSO or some other kind of page for “fast syscalls”,
to optimise away the need for cmpxchg to call into the
kernel (if possible), and maybe speed up e.g. gettimeofday
* bump time_t to 64 bit (pet peeve of mine)
* … maybe others? I’m not a Linux kernel coder.
Thorsten, thanks for commenting. I am not really familiar with all this, but given what you say is true, is it fair to say that the ColdFire v4 is a 'less optimal' processor for running Linux? A thing to add is that there are no free register left for a TLS base.
Does anybody have really good experience with running Linux on the v4?
Thanks
Lars Horvath
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-03-07 10:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-13 11:20 Coldfire v4 low performance of read/write to shared memory Lars Michael
2013-02-16 18:54 ` Thorsten Glaser
2013-03-07 10:01 ` Lars Michael
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox