* Can I use VFP in work queue context ?
@ 2010-04-21 23:11 Anbumony, Kasi Lakshman Karthi
2010-04-22 10:09 ` Ben Dooks
2010-04-22 12:49 ` Siarhei Siamashka
0 siblings, 2 replies; 9+ messages in thread
From: Anbumony, Kasi Lakshman Karthi @ 2010-04-21 23:11 UTC (permalink / raw)
To: linux-arm-kernel
I have done some optimization in NEON (Neon/VFP sharing the same register set) and using it in my driver running in (Linux) kernel space. The neon optimized code will be used under a work queue context and not under any interrupt. Going by the design of Linux kernel, it looks like there is no context save and restore on VFP registers whenever there is a context switch from user mode to kernel mode, but Linux handles the same for user space processes.
Currently I am not seeing any issues (may be lucky) with my implementation (any exceptions) and was wondering whether it is safe to use of neon code in work queue context? My development platform is OMAP 3 (ARM cortex A-8) and using Android with Linux kernel: 2.6.29.
Thanks
-Anbumony, Kasi Lakshman Karthi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100421/c337dc9a/attachment-0001.htm>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Can I use VFP in work queue context ?
2010-04-21 23:11 Can I use VFP in work queue context ? Anbumony, Kasi Lakshman Karthi
@ 2010-04-22 10:09 ` Ben Dooks
2010-04-22 12:09 ` Måns Rullgård
2010-04-22 12:49 ` Siarhei Siamashka
1 sibling, 1 reply; 9+ messages in thread
From: Ben Dooks @ 2010-04-22 10:09 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Apr 21, 2010 at 06:11:18PM -0500, Anbumony, Kasi Lakshman Karthi wrote:
> I have done some optimization in NEON (Neon/VFP sharing the same register set) and using it in my driver running in (Linux) kernel space. The neon optimized code will be used under a work queue context and not under any interrupt. Going by the design of Linux kernel, it looks like there is no context save and restore on VFP registers whenever there is a context switch from user mode to kernel mode, but Linux handles the same for user space processes.
>
> Currently I am not seeing any issues (may be lucky) with my implementation (any exceptions) and was wondering whether it is safe to use of neon code in work queue context? My development platform is OMAP 3 (ARM cortex A-8) and using Android with Linux kernel: 2.6.29.
No FP in the kernel.
I expect this applies to NEON too.
If you're doing intensive processing in a work queue, you're probably
trying to solve the problem in the wrong place.
--
Ben
Q: What's a light-year?
A: One-third less calories than a regular year.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Can I use VFP in work queue context ?
2010-04-22 10:09 ` Ben Dooks
@ 2010-04-22 12:09 ` Måns Rullgård
0 siblings, 0 replies; 9+ messages in thread
From: Måns Rullgård @ 2010-04-22 12:09 UTC (permalink / raw)
To: linux-arm-kernel
Ben Dooks <ben-linux@fluff.org> writes:
> On Wed, Apr 21, 2010 at 06:11:18PM -0500, Anbumony, Kasi Lakshman Karthi wrote:
>> I have done some optimization in NEON (Neon/VFP sharing the same
>> register set) and using it in my driver running in (Linux) kernel
>> space. The neon optimized code will be used under a work queue
>> context and not under any interrupt. Going by the design of Linux
>> kernel, it looks like there is no context save and restore on VFP
>> registers whenever there is a context switch from user mode to
>> kernel mode, but Linux handles the same for user space processes.
>>
>> Currently I am not seeing any issues (may be lucky) with my
>> implementation (any exceptions) and was wondering whether it is
>> safe to use of neon code in work queue context? My development
>> platform is OMAP 3 (ARM cortex A-8) and using Android with Linux
>> kernel: 2.6.29.
>
> No FP in the kernel.
>
> I expect this applies to NEON too.
Same registers, so yes.
--
M?ns Rullg?rd
mans at mansr.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Can I use VFP in work queue context ?
2010-04-21 23:11 Can I use VFP in work queue context ? Anbumony, Kasi Lakshman Karthi
2010-04-22 10:09 ` Ben Dooks
@ 2010-04-22 12:49 ` Siarhei Siamashka
2010-04-22 16:58 ` Woodruff, Richard
1 sibling, 1 reply; 9+ messages in thread
From: Siarhei Siamashka @ 2010-04-22 12:49 UTC (permalink / raw)
To: linux-arm-kernel
On Thursday 22 April 2010 02:11:18 ext Anbumony, Kasi Lakshman Karthi wrote:
> I have done some optimization in NEON (Neon/VFP sharing the same register
> set) and using it in my driver running in (Linux) kernel space. The neon
> optimized code will be used under a work queue context and not under any
> interrupt. Going by the design of Linux kernel, it looks like there is no
> context save and restore on VFP registers whenever there is a context
> switch from user mode to kernel mode, but Linux handles the same for user
> space processes.
>
> Currently I am not seeing any issues (may be lucky) with my implementation
> (any exceptions) and was wondering whether it is safe to use of neon code
> in work queue context? My development platform is OMAP 3 (ARM cortex A-8)
> and using Android with Linux kernel: 2.6.29.
I also thought about the possibility to use NEON in the kernel some time ago,
but did not pursue it further, expecting negative feedback similar to what you
are getting now ;-)
Some MMX/SSE2 code exists in the kernel already if you grep the sources. It
might give you some ideas. Naturally preemption has to be disabled and NEON
state fully saved/restored on entry/exit, like in 'kernel_fpu_begin' and
'kernel_fpu_end' functions for x86. For ARM Cortex-A8 using NEON may be a bit
more difficult because NEON is in a separate power domain and may be switched
off at the time you want to use it. There may be some other technical problems
for sure, so the comments from someone knowledgeable would be very
interesting to read.
--
Best regards,
Siarhei Siamashka
^ permalink raw reply [flat|nested] 9+ messages in thread
* Can I use VFP in work queue context ?
2010-04-22 12:49 ` Siarhei Siamashka
@ 2010-04-22 16:58 ` Woodruff, Richard
2010-04-22 17:19 ` Nicolas Pitre
0 siblings, 1 reply; 9+ messages in thread
From: Woodruff, Richard @ 2010-04-22 16:58 UTC (permalink / raw)
To: linux-arm-kernel
> From: linux-arm-kernel-bounces at lists.infradead.org [mailto:linux-arm-kernel-
> bounces at lists.infradead.org] On Behalf Of Siarhei Siamashka
> I also thought about the possibility to use NEON in the kernel some time ago,
> but did not pursue it further, expecting negative feedback similar to what you
> are getting now ;-)
>
> Some MMX/SSE2 code exists in the kernel already if you grep the sources. It
> might give you some ideas. Naturally preemption has to be disabled and NEON
> state fully saved/restored on entry/exit, like in 'kernel_fpu_begin' and
> 'kernel_fpu_end' functions for x86. For ARM Cortex-A8 using NEON may be a bit
> more difficult because NEON is in a separate power domain and may be switched
> off at the time you want to use it. There may be some other technical problems
> for sure, so the comments from someone knowledgeable would be very
> interesting to read.
VFP/NEON executed in the kernel while in some others processes context does seem like a big no (or at least requiring details as you point out).
But a work queue is built on a half backed process which is schedulable. VFP context handling is tied to a process so it would seem a smaller task to allow VFP to work against this. If the work queue executes in a unique context it could follow the same lazy strategy a user space one does.
Or is that way off?
Regards,
Richard W.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Can I use VFP in work queue context ?
2010-04-22 16:58 ` Woodruff, Richard
@ 2010-04-22 17:19 ` Nicolas Pitre
2010-04-22 19:33 ` Woodruff, Richard
0 siblings, 1 reply; 9+ messages in thread
From: Nicolas Pitre @ 2010-04-22 17:19 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 22 Apr 2010, Woodruff, Richard wrote:
>
> > From: linux-arm-kernel-bounces at lists.infradead.org [mailto:linux-arm-kernel-
> > bounces at lists.infradead.org] On Behalf Of Siarhei Siamashka
>
> > I also thought about the possibility to use NEON in the kernel some time ago,
> > but did not pursue it further, expecting negative feedback similar to what you
> > are getting now ;-)
> >
> > Some MMX/SSE2 code exists in the kernel already if you grep the sources. It
> > might give you some ideas. Naturally preemption has to be disabled and NEON
> > state fully saved/restored on entry/exit, like in 'kernel_fpu_begin' and
> > 'kernel_fpu_end' functions for x86. For ARM Cortex-A8 using NEON may be a bit
> > more difficult because NEON is in a separate power domain and may be switched
> > off at the time you want to use it. There may be some other technical problems
> > for sure, so the comments from someone knowledgeable would be very
> > interesting to read.
>
> VFP/NEON executed in the kernel while in some others processes context does seem like a big no (or at least requiring details as you point out).
>
> But a work queue is built on a half backed process which is
> schedulable. VFP context handling is tied to a process so it would
> seem a smaller task to allow VFP to work against this.
You probably mean a kernel thread here, which is not the same as a work
queue.
> If the work queue executes in a unique context it could follow the
> same lazy strategy a user space one does.
It could... but that begs the question: what is this that requires so
much processing power within the kernel? This really needs to be fully
understood and justified before even considering a possible VFP usage in
the kernel.
Nicolas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Can I use VFP in work queue context ?
2010-04-22 17:19 ` Nicolas Pitre
@ 2010-04-22 19:33 ` Woodruff, Richard
2010-04-22 19:40 ` Nicolas Pitre
2010-04-29 20:38 ` Siarhei Siamashka
0 siblings, 2 replies; 9+ messages in thread
From: Woodruff, Richard @ 2010-04-22 19:33 UTC (permalink / raw)
To: linux-arm-kernel
> From: Nicolas Pitre [mailto:nico at fluxnic.net]
> Sent: Thursday, April 22, 2010 12:20 PM
> > If the work queue executes in a unique context it could follow the
> > same lazy strategy a user space one does.
>
> It could... but that begs the question: what is this that requires so
> much processing power within the kernel? This really needs to be fully
> understood and justified before even considering a possible VFP usage in
> the kernel.
A simple candidate is Neon memory copy. It can perform much better than the ARM based one. There are a few unfortunate copies associated with some networking devices which see decent benefit.
Regards,
Richard W.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Can I use VFP in work queue context ?
2010-04-22 19:33 ` Woodruff, Richard
@ 2010-04-22 19:40 ` Nicolas Pitre
2010-04-29 20:38 ` Siarhei Siamashka
1 sibling, 0 replies; 9+ messages in thread
From: Nicolas Pitre @ 2010-04-22 19:40 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 22 Apr 2010, Woodruff, Richard wrote:
>
> > From: Nicolas Pitre [mailto:nico at fluxnic.net]
> > Sent: Thursday, April 22, 2010 12:20 PM
>
> > > If the work queue executes in a unique context it could follow the
> > > same lazy strategy a user space one does.
> >
> > It could... but that begs the question: what is this that requires so
> > much processing power within the kernel? This really needs to be fully
> > understood and justified before even considering a possible VFP usage in
> > the kernel.
>
> A simple candidate is Neon memory copy. It can perform much better
> than the ARM based one. There are a few unfortunate copies associated
> with some networking devices which see decent benefit.
The benefit is often lost when you need to defer them to a schedulable
kernel thread though.
Nicolas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Can I use VFP in work queue context ?
2010-04-22 19:33 ` Woodruff, Richard
2010-04-22 19:40 ` Nicolas Pitre
@ 2010-04-29 20:38 ` Siarhei Siamashka
1 sibling, 0 replies; 9+ messages in thread
From: Siarhei Siamashka @ 2010-04-29 20:38 UTC (permalink / raw)
To: linux-arm-kernel
On Thursday 22 April 2010 22:33:37 ext Woodruff, Richard wrote:
> > From: Nicolas Pitre [mailto:nico at fluxnic.net]
> > Sent: Thursday, April 22, 2010 12:20 PM
> >
> > > If the work queue executes in a unique context it could follow the
> > > same lazy strategy a user space one does.
> >
> > It could... but that begs the question: what is this that requires so
> > much processing power within the kernel? This really needs to be fully
> > understood and justified before even considering a possible VFP usage in
> > the kernel.
>
> A simple candidate is Neon memory copy. It can perform much better than the
> ARM based one. There are a few unfortunate copies associated with some
> networking devices which see decent benefit.
Yes, that was my intention from the start.
But after doing some benchmarks, now I suspect that the significant
performance improvement from using NEON instructions is specific to r1pX
Cortex-A8 revision in OMAP34xx/OMAP35xx and newer chips don't gain much.
--
Best regards,
Siarhei Siamashka
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-04-29 20:38 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-21 23:11 Can I use VFP in work queue context ? Anbumony, Kasi Lakshman Karthi
2010-04-22 10:09 ` Ben Dooks
2010-04-22 12:09 ` Måns Rullgård
2010-04-22 12:49 ` Siarhei Siamashka
2010-04-22 16:58 ` Woodruff, Richard
2010-04-22 17:19 ` Nicolas Pitre
2010-04-22 19:33 ` Woodruff, Richard
2010-04-22 19:40 ` Nicolas Pitre
2010-04-29 20:38 ` Siarhei Siamashka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox