From mboxrd@z Thu Jan 1 00:00:00 1970 From: epsi@gmx.de Subject: Re: RE: RE: RE: Memory performance / Cache problem Date: Wed, 14 Oct 2009 19:23:14 +0200 Message-ID: <20091014172314.130150@gmx.net> References: <20091012083806.323990@gmx.net> <20091013081651.300080@gmx.net> <13B9B4C6EF24D648824FF11BE8967162039B235D5C@dlee02.ent.ti.com> <20091014144839.115530@gmx.net> <13B9B4C6EF24D648824FF11BE8967162039B235EFA@dlee02.ent.ti.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail.gmx.net ([213.165.64.20]:56561 "HELO mail.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752187AbZJNRYB (ORCPT ); Wed, 14 Oct 2009 13:24:01 -0400 In-Reply-To: <13B9B4C6EF24D648824FF11BE8967162039B235EFA@dlee02.ent.ti.com> Sender: linux-omap-owner@vger.kernel.org List-Id: linux-omap@vger.kernel.org To: "Woodruff, Richard" , premi@ti.com, linux-omap@vger.kernel.org > > Mem clock is both times 166MHz. I don't know whether are difference= s in > cycle > > access and timing, but memclock is fine. >=20 > How did you physically verify this? Oszi show 166MHz, also the kernel message about freq are in both kernel= s the same. > > Following Siarhei hints of initialize the buffers (around 1.2 MByte > each) > > I get different results in 22kernel for use of > > malloc alone > > memcpy =3D 473.764, loop4 =3D 448.430, loop1 =3D 102.770, ran= d =3D =20 > 29.641 > > calloc alone > > memcpy =3D 405.947, loop4 =3D 361.550, loop1 =3D 95.441, ran= d =3D =20 > 21.853 > > malloc+memset: > > memcpy =3D 239.294, loop4 =3D 188.617, loop1 =3D 80.871, ran= d =3D =20 > 4.726 > > > > In 31kernel all 3 measures are about the same (unfortunatly low) le= vel > of > > malloc+memset in 22. >=20 > Yes aligned buffers can make a difference. But probably more so for = small > copies. Of course you must touch the memory or mprotect() it so its > faulted in, but indications are you have done this. Mh, alignment (to an address) is done with malloc already. Probably you= mean something different. I don't understand the difference. For me is= malloc+memset=3Dcalloc.=20 I'll send you the benchmark code, if you like.=20 > > I used a standard memcpy (think this is glib and hence not neonbase= d)? > > To be neonbased I guess it has to be recompiled? >=20 > The version of glibc in use can make a difference. CodeSourcery in 2= 009 > release added PLD's to mem operations. This can give a good benefit.= It > might be you have optimized library in one case and a non-optimized i= n > another. In both kernels I used the same rootfs (via NFS). Indeed I used CS2009q= 1 and its libs, but we are talking about factor 2..6. This must be some= thing serious. What is your feeling? Does the 22 something strange or are the newer ke= rnels slower that they have to be. Would be interesting to see results on other Omap3 boards with both old= an new kernels. Best regards Steffen --=20 GRATIS f=FCr alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 -- To unsubscribe from this list: send the line "unsubscribe linux-omap" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html