* [PATCH 0/4] memcpy optimized with strd/ldrd
[not found] <03e101cd0e07$eec39f10$cc4add30$@com>
@ 2012-03-30 11:41 ` Boojin Kim
2012-03-30 13:19 ` Nicolas Pitre
0 siblings, 1 reply; 6+ messages in thread
From: Boojin Kim @ 2012-03-30 11:41 UTC (permalink / raw)
To: linux-arm-kernel
Nicolas Pitre wrote:
>
>
> Here's my version. Lightly tested.
> I have no A15 hardware to run any performance comparison though.
>
I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
I will give you feedback soon within this week.
Wait a little more.
And, Thanks for your patches. :)
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 0/4] memcpy optimized with strd/ldrd
2012-03-30 11:41 ` [PATCH 0/4] memcpy optimized with strd/ldrd Boojin Kim
@ 2012-03-30 13:19 ` Nicolas Pitre
2012-04-03 8:07 ` Boojin Kim
0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2012-03-30 13:19 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, 30 Mar 2012, Boojin Kim wrote:
> Nicolas Pitre wrote:
> >
> >
> > Here's my version. Lightly tested.
> > I have no A15 hardware to run any performance comparison though.
> >
> I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
> I will give you feedback soon within this week.
> Wait a little more.
> And, Thanks for your patches. :)
FYI, it occurred to me that some corner cases might not be quite right
with regards to alignment for the STRD instruction. It seems that the
hardware on which I tested it (Marvell Dove CPU) apparently copes with
misaligned SDRD's when they're still 32-bit aligned. So I need to run
this code through a real validation harness on different hardware.
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 0/4] memcpy optimized with strd/ldrd
2012-03-30 13:19 ` Nicolas Pitre
@ 2012-04-03 8:07 ` Boojin Kim
2012-04-03 14:48 ` Nicolas Pitre
0 siblings, 1 reply; 6+ messages in thread
From: Boojin Kim @ 2012-04-03 8:07 UTC (permalink / raw)
To: linux-arm-kernel
Nicolas Pitre wrote:
> > >
> > > Here's my version. Lightly tested.
> > > I have no A15 hardware to run any performance comparison though.
> > >
> > I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
> > I will give you feedback soon within this week.
> > Wait a little more.
> > And, Thanks for your patches. :)
>
> FYI, it occurred to me that some corner cases might not be quite right
> with regards to alignment for the STRD instruction. It seems that the
> hardware on which I tested it (Marvell Dove CPU) apparently copes with
> misaligned SDRD's when they're still 32-bit aligned. So I need to run
> this code through a real validation harness on different hardware.
It's sad, but the performance result wasn't better after adapting your patch.
I think something on 1~3 patch brings performance degreasing.
Thanks :)
>
>
> Nicolas
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 0/4] memcpy optimized with strd/ldrd
2012-04-03 8:07 ` Boojin Kim
@ 2012-04-03 14:48 ` Nicolas Pitre
2012-04-26 7:35 ` Boojin Kim
0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2012-04-03 14:48 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 3 Apr 2012, Boojin Kim wrote:
> Nicolas Pitre wrote:
>
> > > >
> > > > Here's my version. Lightly tested.
> > > > I have no A15 hardware to run any performance comparison though.
> > > >
> > > I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
> > > I will give you feedback soon within this week.
> > > Wait a little more.
> > > And, Thanks for your patches. :)
> >
> > FYI, it occurred to me that some corner cases might not be quite right
> > with regards to alignment for the STRD instruction. It seems that the
> > hardware on which I tested it (Marvell Dove CPU) apparently copes with
> > misaligned SDRD's when they're still 32-bit aligned. So I need to run
> > this code through a real validation harness on different hardware.
>
> It's sad, but the performance result wasn't better after adapting your patch.
> I think something on 1~3 patch brings performance degreasing.
If you could identify which patch is responsible that would be helpful.
Thanks.
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 0/4] memcpy optimized with strd/ldrd
2012-04-03 14:48 ` Nicolas Pitre
@ 2012-04-26 7:35 ` Boojin Kim
0 siblings, 0 replies; 6+ messages in thread
From: Boojin Kim @ 2012-04-26 7:35 UTC (permalink / raw)
To: linux-arm-kernel
Nicolas Pitre wrote:
> Sent: Tuesday, April 03, 2012 11:49 PM
> To: Boojin Kim
> Cc: linux-arm-kernel at lists.infradead.org
> Subject: RE: [PATCH 0/4] memcpy optimized with strd/ldrd
>
> On Tue, 3 Apr 2012, Boojin Kim wrote:
>
> > Nicolas Pitre wrote:
> >
> > > > >
> > > > > Here's my version. Lightly tested.
> > > > > I have no A15 hardware to run any performance comparison though.
> > > > >
> > > > I'm reviewing and testing your patch. But, My other work disturbs to reviewing it.
> > > > I will give you feedback soon within this week.
> > > > Wait a little more.
> > > > And, Thanks for your patches. :)
> > >
> > > FYI, it occurred to me that some corner cases might not be quite right
> > > with regards to alignment for the STRD instruction. It seems that the
> > > hardware on which I tested it (Marvell Dove CPU) apparently copes with
> > > misaligned SDRD's when they're still 32-bit aligned. So I need to run
> > > this code through a real validation harness on different hardware.
> >
> > It's sad, but the performance result wasn't better after adapting your patch.
> > I think something on 1~3 patch brings performance degreasing.
>
> If you could identify which patch is responsible that would be helpful.
Sorry for late response. I'm so busy these days. Y_Y
I checked your patches. And, the 1st patch makes performance drop.
Transmit time for 4KB memcpy is 489ns. After applying 1st patch, the transmit time is 578ns.
Performance also drops on memcpy of other small size about 10%.
I wish this is helpful for you.
Thanks,
>
> Thanks.
>
>
> Nicolas
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] ARM: lib: Add optimized memcpy with 64 byte pld size
@ 2012-03-28 5:23 Nicolas Pitre
2012-03-29 4:00 ` [PATCH 0/4] memcpy optimized with strd/ldrd Nicolas Pitre
0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2012-03-28 5:23 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 28 Mar 2012, Boojin Kim wrote:
> Nicolas wrote:
>
> > This creates quite convoluted code. If this is worth doing, we'll have
> > to find a cleaner way to do this.
> >
> > Could you please provide performance measurement numbers with and
> > without this patch, and similarly for the next patch?
> >
> > Did you try enabling the cache alignment code? What performance
> > difference if any did you see?
> My patch brings about 10% better result on cache boundary.
> 64bytes PLD size makes the cache efficiency be higher on machines that has 64byte cache line.
> And, Which one is convoluted code? Can you explain it more detail?
Yes, I will. I now have reworked this code to be extensible and still
as clean as possible. I'm not going to post it right away though, given
that it is late and I prefer to have another look at it after I had some
sleep.
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-04-26 7:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <03e101cd0e07$eec39f10$cc4add30$@com>
2012-03-30 11:41 ` [PATCH 0/4] memcpy optimized with strd/ldrd Boojin Kim
2012-03-30 13:19 ` Nicolas Pitre
2012-04-03 8:07 ` Boojin Kim
2012-04-03 14:48 ` Nicolas Pitre
2012-04-26 7:35 ` Boojin Kim
2012-03-28 5:23 [PATCH 1/2] ARM: lib: Add optimized memcpy with 64 byte pld size Nicolas Pitre
2012-03-29 4:00 ` [PATCH 0/4] memcpy optimized with strd/ldrd Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).