public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register
       [not found] <1315894031-9579-1-git-send-email-siarhei.siamashka@gmail.com>
@ 2011-09-14  6:08 ` Kyungmin Park
  2011-09-14  6:13   ` Santosh
  2011-09-14  7:43   ` Siarhei Siamashka
  0 siblings, 2 replies; 7+ messages in thread
From: Kyungmin Park @ 2011-09-14  6:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Siarhei,

Interesting feature, and it's not samsung soc issue, so add the arm
mailing list.
It checked and the see the read performance improvement from 868MiB/s
to 981MiB/s with lmbench.
It's helpful to test other SoC., e.g., OMAP4, STE and so on.

BTW, why do you set the 27-bit? In my PL310 Spec., it's reserved bit
and should be zero (SBZ).

Thank you,
Kyungmin Park

On Tue, Sep 13, 2011 at 3:07 PM, Siarhei Siamashka
<siarhei.siamashka@gmail.com> wrote:
> Setting "Double linefill enable" bit improves memcpy performance
> from ~750 MB/s to ~1150 MB/s when working with large buffers and
> also the performance of just anything which may need good memory
> bandwidth (for example, software rendered graphics).
>
> Additionally setting "Double linefill on WRAP read disable" bit
> compensates most of the random access latency increase.
>
> Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
> ---
> ?arch/arm/mach-exynos4/cpu.c | ? ?2 +-
> ?1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/arm/mach-exynos4/cpu.c b/arch/arm/mach-exynos4/cpu.c
> index ba503c3..1afd25f 100644
> --- a/arch/arm/mach-exynos4/cpu.c
> +++ b/arch/arm/mach-exynos4/cpu.c
> @@ -238,7 +238,7 @@ static int __init exynos4_l2x0_cache_init(void)
> ? ? ? ?__raw_writel(0x110, S5P_VA_L2CC + L2X0_DATA_LATENCY_CTRL);
>
> ? ? ? ?/* L2X0 Prefetch Control */
> - ? ? ? __raw_writel(0x30000007, S5P_VA_L2CC + L2X0_PREFETCH_CTRL);
> + ? ? ? __raw_writel(0x78000007, S5P_VA_L2CC + L2X0_PREFETCH_CTRL);
>
> ? ? ? ?/* L2X0 Power Control */
> ? ? ? ?__raw_writel(L2X0_DYNAMIC_CLK_GATING_EN | L2X0_STNDBY_MODE_EN,
> --
> 1.7.3.4
>
>
> _______________________________________________
> linaro-dev mailing list
> linaro-dev at lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-dev
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register
  2011-09-14  6:08 ` [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register Kyungmin Park
@ 2011-09-14  6:13   ` Santosh
  2011-09-14  7:43   ` Siarhei Siamashka
  1 sibling, 0 replies; 7+ messages in thread
From: Santosh @ 2011-09-14  6:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 14 September 2011 11:38 AM, Kyungmin Park wrote:
> Hi Siarhei,
>
> Interesting feature, and it's not samsung soc issue, so add the arm
> mailing list.
> It checked and the see the read performance improvement from 868MiB/s
> to 981MiB/s with lmbench.
> It's helpful to test other SoC., e.g., OMAP4, STE and so on.
>
> BTW, why do you set the 27-bit? In my PL310 Spec., it's reserved bit
> and should be zero (SBZ).
>
That's because not all PL310 versions double line fill.

Regards
santosh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register
  2011-09-14  6:08 ` [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register Kyungmin Park
  2011-09-14  6:13   ` Santosh
@ 2011-09-14  7:43   ` Siarhei Siamashka
  2011-09-14  7:57     ` Kyungmin Park
  1 sibling, 1 reply; 7+ messages in thread
From: Siarhei Siamashka @ 2011-09-14  7:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 14, 2011 at 9:08 AM, Kyungmin Park <kmpark@infradead.org> wrote:
> Hi Siarhei,
>
> Interesting feature, and it's not samsung soc issue, so add the arm
> mailing list.
> It checked and the see the read performance improvement from 868MiB/s
> to 981MiB/s with lmbench.

Maybe lmbench does not try very hard to get the best out of the
hardware? On my origenboard, I'm getting ~1.15GB/s performance for the
standard LDM/STM based memcpy from libc-ports, which is ~2.3GB/s
memory bandwidth if both reads and writes are accounted separately.

> It's helpful to test other SoC., e.g., OMAP4, STE and so on.

The current (?) state of the support for this feature in OMAP4 is
explained here by Richard Woodruff:
    http://groups.google.com/group/pandaboard/msg/dfd2d2e1336d435b

> BTW, why do you set the 27-bit? In my PL310 Spec., it's reserved bit
> and should be zero (SBZ).

This PL310 thing seems to have been renamed to "CoreLink Level 2 Cache
Controller L2C-310" in later revisions, and its Prefetch Control
Register is described here:
    http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246f/CHDHIECI.html

Sorry for the confusing subject.

Regarding bit 27 ('Double linefill on WRAP read disable'), it seems to
reduce the impact of enabling double linefill on the random access
latency as measured by my self-written simple memory benchmark
program:
    http://github.com/downloads/ssvb/ssvb-membench/ssvb-membench-0.1.tar.gz

-- 
Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register
  2011-09-14  7:43   ` Siarhei Siamashka
@ 2011-09-14  7:57     ` Kyungmin Park
  2011-09-14  8:20       ` Siarhei Siamashka
  0 siblings, 1 reply; 7+ messages in thread
From: Kyungmin Park @ 2011-09-14  7:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 14, 2011 at 4:43 PM, Siarhei Siamashka
<siarhei.siamashka@gmail.com> wrote:
> On Wed, Sep 14, 2011 at 9:08 AM, Kyungmin Park <kmpark@infradead.org> wrote:
>> Hi Siarhei,
>>
>> Interesting feature, and it's not samsung soc issue, so add the arm
>> mailing list.
>> It checked and the see the read performance improvement from 868MiB/s
>> to 981MiB/s with lmbench.
>
> Maybe lmbench does not try very hard to get the best out of the
> hardware? On my origenboard, I'm getting ~1.15GB/s performance for the
> standard LDM/STM based memcpy from libc-ports, which is ~2.3GB/s
> memory bandwidth if both reads and writes are accounted separately.
>
>> It's helpful to test other SoC., e.g., OMAP4, STE and so on.
>
> The current (?) state of the support for this feature in OMAP4 is
> explained here by Richard Woodruff:
> ? ?http://groups.google.com/group/pandaboard/msg/dfd2d2e1336d435b
>
>> BTW, why do you set the 27-bit? In my PL310 Spec., it's reserved bit
>> and should be zero (SBZ).
>
> This PL310 thing seems to have been renamed to "CoreLink Level 2 Cache
> Controller L2C-310" in later revisions, and its Prefetch Control
> Register is described here:
> ? ?http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246f/CHDHIECI.html
Thanks for link. it has 27-bit description. but does it correct bit
description for exynos4 PL310?
I mean I received the PL310 TRM with exynos4 chip used. there's no
27-bit description. it's just reserved bit.
Can it enable the 27-bit at exynos4210? or can be used for exynos4212 or later?

Thank you,
Kyungmin Park
>
> Sorry for the confusing subject.
>
> Regarding bit 27 ('Double linefill on WRAP read disable'), it seems to
> reduce the impact of enabling double linefill on the random access
> latency as measured by my self-written simple memory benchmark
> program:
> ? ?http://github.com/downloads/ssvb/ssvb-membench/ssvb-membench-0.1.tar.gz
>
> --
> Best regards,
> Siarhei Siamashka
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register
  2011-09-14  7:57     ` Kyungmin Park
@ 2011-09-14  8:20       ` Siarhei Siamashka
  2011-09-14 11:23         ` Kukjin Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Siarhei Siamashka @ 2011-09-14  8:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 14, 2011 at 10:57 AM, Kyungmin Park <kmpark@infradead.org> wrote:
> On Wed, Sep 14, 2011 at 4:43 PM, Siarhei Siamashka
> <siarhei.siamashka@gmail.com> wrote:
>> On Wed, Sep 14, 2011 at 9:08 AM, Kyungmin Park <kmpark@infradead.org> wrote:
>>> Hi Siarhei,
>>>
>>> Interesting feature, and it's not samsung soc issue, so add the arm
>>> mailing list.
>>> It checked and the see the read performance improvement from 868MiB/s
>>> to 981MiB/s with lmbench.
>>
>> Maybe lmbench does not try very hard to get the best out of the
>> hardware? On my origenboard, I'm getting ~1.15GB/s performance for the
>> standard LDM/STM based memcpy from libc-ports, which is ~2.3GB/s
>> memory bandwidth if both reads and writes are accounted separately.
>>
>>> It's helpful to test other SoC., e.g., OMAP4, STE and so on.
>>
>> The current (?) state of the support for this feature in OMAP4 is
>> explained here by Richard Woodruff:
>> ? ?http://groups.google.com/group/pandaboard/msg/dfd2d2e1336d435b
>>
>>> BTW, why do you set the 27-bit? In my PL310 Spec., it's reserved bit
>>> and should be zero (SBZ).
>>
>> This PL310 thing seems to have been renamed to "CoreLink Level 2 Cache
>> Controller L2C-310" in later revisions, and its Prefetch Control
>> Register is described here:
>> ? ?http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246f/CHDHIECI.html
> Thanks for link. it has 27-bit description. but does it correct bit
> description for exynos4 PL310?
> I mean I received the PL310 TRM with exynos4 chip used. there's no
> 27-bit description. it's just reserved bit.
> Can it enable the 27-bit at exynos4210? or can be used for exynos4212 or later?

That's a good point. I think it is exynos4210 that is used in
origenboard. And according to the value in Cache ID Register
(0x4100c4c5), it has r3p0 revision of L2C-310. Which means that the
Prefetch Control Register is actually described at:
    http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246d/CHDHIECI.html
And bit 27 is indeed reserved. However flipping it seems to have some
measurable impact on performance (unless I screwed up the benchmarks),
so maybe it does something but is undocumented? In any case, I agree
that it's better not to mess up with this bit.

By the way, does anybody have L2C-310 errata list? Is double linefill
actually safe to use in r3p0?

-- 
Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register
  2011-09-14  8:20       ` Siarhei Siamashka
@ 2011-09-14 11:23         ` Kukjin Kim
  2011-09-14 21:22           ` Siarhei Siamashka
  0 siblings, 1 reply; 7+ messages in thread
From: Kukjin Kim @ 2011-09-14 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Siarhei Siamashka wrote:
> 
> On Wed, Sep 14, 2011 at 10:57 AM, Kyungmin Park <kmpark@infradead.org>
> wrote:
> > On Wed, Sep 14, 2011 at 4:43 PM, Siarhei Siamashka
> > <siarhei.siamashka@gmail.com> wrote:
> >> On Wed, Sep 14, 2011 at 9:08 AM, Kyungmin Park <kmpark@infradead.org>
> wrote:
> >>> Hi Siarhei,
> >>>
> >>> Interesting feature, and it's not samsung soc issue, so add the arm
> >>> mailing list.
> >>> It checked and the see the read performance improvement from 868MiB/s
> >>> to 981MiB/s with lmbench.
> >>
> >> Maybe lmbench does not try very hard to get the best out of the
> >> hardware? On my origenboard, I'm getting ~1.15GB/s performance for the
> >> standard LDM/STM based memcpy from libc-ports, which is ~2.3GB/s
> >> memory bandwidth if both reads and writes are accounted separately.
> >>
> >>> It's helpful to test other SoC., e.g., OMAP4, STE and so on.
> >>
> >> The current (?) state of the support for this feature in OMAP4 is
> >> explained here by Richard Woodruff:
> >>    http://groups.google.com/group/pandaboard/msg/dfd2d2e1336d435b
> >>
> >>> BTW, why do you set the 27-bit? In my PL310 Spec., it's reserved bit
> >>> and should be zero (SBZ).
> >>
> >> This PL310 thing seems to have been renamed to "CoreLink Level 2 Cache
> >> Controller L2C-310" in later revisions, and its Prefetch Control
> >> Register is described here:
> >>    http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246f/CHDHIECI.html
> > Thanks for link. it has 27-bit description. but does it correct bit
> > description for exynos4 PL310?
> > I mean I received the PL310 TRM with exynos4 chip used. there's no
> > 27-bit description. it's just reserved bit.
> > Can it enable the 27-bit at exynos4210? or can be used for exynos4212 or later?
> 
> That's a good point. I think it is exynos4210 that is used in
> origenboard. And according to the value in Cache ID Register
> (0x4100c4c5), it has r3p0 revision of L2C-310. Which means that the
> Prefetch Control Register is actually described at:
>     http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246d/CHDHIECI.html
> And bit 27 is indeed reserved. However flipping it seems to have some
> measurable impact on performance (unless I screwed up the benchmarks),
> so maybe it does something but is undocumented? In any case, I agree
> that it's better not to mess up with this bit.
> 
Hi all,

Please adding me in Cc for Samsung stuff...

> By the way, does anybody have L2C-310 errata list? Is double linefill
> actually safe to use in r3p0?
> 
No. it is _not_ safe on EXYNOS4210.

Since L2C-310 ERRTA, current EXYNOS4210 cannot enable double linefill feature and as Siarhei said, need to check its version of L2C-310 in Cache ID register before enabling it. As a note, it's possible to enable it on EXYNOS4212 SoC and in opposite of Siarhei's patch, enabling WRAP read is better on it. Actually my colleague, Boojin Kim is testing it so that can submit it soon.

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register
  2011-09-14 11:23         ` Kukjin Kim
@ 2011-09-14 21:22           ` Siarhei Siamashka
  0 siblings, 0 replies; 7+ messages in thread
From: Siarhei Siamashka @ 2011-09-14 21:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 14, 2011 at 2:23 PM, Kukjin Kim <kgene.kim@samsung.com> wrote:
> Siarhei Siamashka wrote:
>> By the way, does anybody have L2C-310 errata list? Is double linefill
>> actually safe to use in r3p0?
>>
> No. it is _not_ safe on EXYNOS4210.
>
> Since L2C-310 ERRTA, current EXYNOS4210 cannot enable double linefill feature

Thanks for this information. It's a pity, because double linefill
could provide a really serious memory performance boost. Looks like we
have to wait for EXYNOS4212 and/or OMAP4460 to really see how
Cortex-A9 is actually supposed to perform on memory intensive tasks.

However I really appreciate that with EXYNOS4210 you are not shoving
some hardcoded configuration down our throats and not restricting
access to the relevant Cortex-A9 and L2C-310 configuration registers.
So it is still possible to temporarily enable double linefill and use
origenboard for benchmarking purposes to estimate how EXYNOS4212 is
going to perform when it becomes available.

> and as Siarhei said, need to check its version of L2C-310 in Cache ID register before enabling it.

If EXYNOS4212 has a bugfree double linefill support, then enabling it
based on checking L2C-310 revision looks like a good idea.

> As a note, it's possible to enable it on EXYNOS4212 SoC and in opposite of Siarhei's patch, enabling WRAP read is better on it. Actually my colleague, Boojin Kim is testing it so that can submit it soon.

If you have some benchmark results with all these options, they would
be very interesting for me.

As for the general memory performance tuning, there are more things to
try (carefully watching for possible errata):
- SCU Speculative linefills enable bit in SCU Control Register as
described in http://infocenter.arm.com/help/topic/com.arm.doc.ddi0407f/BABEBFBH.html
(this seems to be a good tweak and it really reduces L2 access latency
a bit in my tests)
- Exclusive cache configuration (should increase effective L1/L2 cache
size, but seems to make L2 cache access latency worse in my tests)
- Tune L2C-310 Prefetch offset (without double linefill, the value 6
or even 5 seems to be a bit better than 7)
- 'Alloc in one way', 'Write full line of zeros mode' and maybe something else

Thank you for your replies and the interest in this subject.

-- 
Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-09-14 21:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1315894031-9579-1-git-send-email-siarhei.siamashka@gmail.com>
2011-09-14  6:08 ` [PATCH] ARM: EXYNOS4: Enable double linefill in PL310 Prefetch Control Register Kyungmin Park
2011-09-14  6:13   ` Santosh
2011-09-14  7:43   ` Siarhei Siamashka
2011-09-14  7:57     ` Kyungmin Park
2011-09-14  8:20       ` Siarhei Siamashka
2011-09-14 11:23         ` Kukjin Kim
2011-09-14 21:22           ` Siarhei Siamashka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox