* floating point support in the driver. @ 2008-08-01 10:57 Misbah khan 2008-08-01 11:32 ` Laurent Pinchart 0 siblings, 1 reply; 9+ messages in thread From: Misbah khan @ 2008-08-01 10:57 UTC (permalink / raw) To: linuxppc-embedded Hi all, I have a DSP algorithm which i am running in the application even after enabling the VFP support it is taking a lot of time to get executed hence I want to transform the same into the driver insted of an user application. Can anybody suggest whether doing the same could be a better solution and what could be the chalenges that i have to face by implimenting such floating point support in the driver. Is there a way in the application itself to make it execute faster. ---- Misbah <>< -- View this message in context: http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18772109.html Sent from the linuxppc-embedded mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: floating point support in the driver. 2008-08-01 10:57 floating point support in the driver Misbah khan @ 2008-08-01 11:32 ` Laurent Pinchart 2008-08-01 12:00 ` Misbah khan 0 siblings, 1 reply; 9+ messages in thread From: Laurent Pinchart @ 2008-08-01 11:32 UTC (permalink / raw) To: linuxppc-embedded; +Cc: Misbah khan [-- Attachment #1: Type: text/plain, Size: 1149 bytes --] On Friday 01 August 2008, Misbah khan wrote: > > Hi all, > > I have a DSP algorithm which i am running in the application even after > enabling the VFP support it is taking a lot of time to get executed hence > > I want to transform the same into the driver insted of an user application. > Can anybody suggest whether doing the same could be a better solution and > what could be the chalenges that i have to face by implimenting such > floating point support in the driver. > > Is there a way in the application itself to make it execute faster. Floating-point in the kernel should be avoided. FPU state save/restore operations are costly and are not performed by the kernel when switching from userspace to kernelspace context. You will have to protect floating-point sections with kernel_fpu_begin/kernel_fpu_end which, if I'm not mistaken, disables preemption. That's probably not something you want to do. Why would the same code run faster in kernelspace then userspace ? -- Laurent Pinchart CSE Semaphore Belgium Chaussee de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: floating point support in the driver. 2008-08-01 11:32 ` Laurent Pinchart @ 2008-08-01 12:00 ` Misbah khan 2008-08-01 15:54 ` M. Warner Losh 0 siblings, 1 reply; 9+ messages in thread From: Misbah khan @ 2008-08-01 12:00 UTC (permalink / raw) To: linuxppc-embedded I am not very clear Why floating point support in the Kernel should be avoided ? We want our DSP algorithm to run at the boot time and since kernel thread having higher priority , i assume that it would be faster than user application. If i really have to speed up my application execution what mechanism will you suggest me to try ? After using Hardware VFP support also i am still laging the timing requirement by 800 ms in my case ---- Misbah <>< Laurent Pinchart-4 wrote: > > On Friday 01 August 2008, Misbah khan wrote: >> >> Hi all, >> >> I have a DSP algorithm which i am running in the application even after >> enabling the VFP support it is taking a lot of time to get executed hence >> >> I want to transform the same into the driver insted of an user >> application. >> Can anybody suggest whether doing the same could be a better solution and >> what could be the chalenges that i have to face by implimenting such >> floating point support in the driver. >> >> Is there a way in the application itself to make it execute faster. > > Floating-point in the kernel should be avoided. FPU state save/restore > operations are costly and are not performed by the kernel when switching > from userspace to kernelspace context. You will have to protect > floating-point sections with kernel_fpu_begin/kernel_fpu_end which, if I'm > not mistaken, disables preemption. That's probably not something you want > to do. Why would the same code run faster in kernelspace then userspace ? > > -- > Laurent Pinchart > CSE Semaphore Belgium > > Chaussee de Bruxelles, 732A > B-1410 Waterloo > Belgium > > T +32 (2) 387 42 59 > F +32 (2) 387 42 75 > > > _______________________________________________ > Linuxppc-embedded mailing list > Linuxppc-embedded@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > -- View this message in context: http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18772952.html Sent from the linuxppc-embedded mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: floating point support in the driver. 2008-08-01 12:00 ` Misbah khan @ 2008-08-01 15:54 ` M. Warner Losh 2008-08-04 5:23 ` Misbah khan 0 siblings, 1 reply; 9+ messages in thread From: M. Warner Losh @ 2008-08-01 15:54 UTC (permalink / raw) To: misbah_khan; +Cc: linuxppc-embedded In message: <18772952.post@talk.nabble.com> Misbah khan <misbah_khan@engineer.com> writes: : I am not very clear Why floating point support in the Kernel should be : avoided ? Because saving the FPU state is expensive. The kernel multiplexes the FPU hardware among all the userland processes that use it. For parts of the kernel to effectively use the FPU, it would have to save the state on traps into the kernel, and restore the state when returning to userland. This is a big drag on performance of the system. There are ways around this optimization where you save the fpu state explicitly, but the expense si still there. : We want our DSP algorithm to run at the boot time and since kernel thread : having higher priority , i assume that it would be faster than user : application. Bad assumption. User threads can get boots in priority in certain cases. If it really is just at boot time, before any other threads are started, you likely can get away with it. : If i really have to speed up my application execution what mechanism will : you suggest me to try ? : : After using Hardware VFP support also i am still laging the timing : requirement by 800 ms in my case This sounds like a classic case of putting 20 pounds in a 10 pound bag and complaining that the bag rips out. You need a bigger bag. If you are doing FPU intensive operations in userland, moving them to the kernel isn't going to help anything but maybe latency. And if you are almost a full second short, your quest to move things into the kernel is almost certainly not going to help enough. Moving things into the kernel only helps latency, and only when there's lots of context switches (since doing stuff in the kernel avoids the domain crossing that forces the save of the CPU state). I don't know if the 800ms timing is relative to a task that must run once a second, or once an hour. If the former, you're totally screwed and need to either be more clever about your algorithm (consider integer math, profiling the hot spots, etc), or you need more powerful silicon. If you are trying to shave 800ms off a task that runs for an hour, then you just might be able to do that with tiny code tweaks. Sorry to be so harsh, but really, there's no such thing as a free lunch. Warner : ---- Misbah <>< : : : Laurent Pinchart-4 wrote: : > : > On Friday 01 August 2008, Misbah khan wrote: : >> : >> Hi all, : >> : >> I have a DSP algorithm which i am running in the application even after : >> enabling the VFP support it is taking a lot of time to get executed hence : >> : >> I want to transform the same into the driver insted of an user : >> application. : >> Can anybody suggest whether doing the same could be a better solution and : >> what could be the chalenges that i have to face by implimenting such : >> floating point support in the driver. : >> : >> Is there a way in the application itself to make it execute faster. : > : > Floating-point in the kernel should be avoided. FPU state save/restore : > operations are costly and are not performed by the kernel when switching : > from userspace to kernelspace context. You will have to protect : > floating-point sections with kernel_fpu_begin/kernel_fpu_end which, if I'm : > not mistaken, disables preemption. That's probably not something you want : > to do. Why would the same code run faster in kernelspace then userspace ? : > : > -- : > Laurent Pinchart : > CSE Semaphore Belgium : > : > Chaussee de Bruxelles, 732A : > B-1410 Waterloo : > Belgium : > : > T +32 (2) 387 42 59 : > F +32 (2) 387 42 75 : > : > : > _______________________________________________ : > Linuxppc-embedded mailing list : > Linuxppc-embedded@ozlabs.org : > https://ozlabs.org/mailman/listinfo/linuxppc-embedded : > : : -- : View this message in context: http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18772952.html : Sent from the linuxppc-embedded mailing list archive at Nabble.com. : : _______________________________________________ : Linuxppc-embedded mailing list : Linuxppc-embedded@ozlabs.org : https://ozlabs.org/mailman/listinfo/linuxppc-embedded : : ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: floating point support in the driver. 2008-08-01 15:54 ` M. Warner Losh @ 2008-08-04 5:23 ` Misbah khan 2008-08-04 5:33 ` M. Warner Losh 0 siblings, 1 reply; 9+ messages in thread From: Misbah khan @ 2008-08-04 5:23 UTC (permalink / raw) To: linuxppc-embedded Thank you Warner. Actually the complete algorithm should take not more than 1 sec to execute but its taking around 1.8 sec .The algorithm would rum between every few secs. I am trying to fine tune the code ,i just want to know that will it a good idea to alter the task priority and what could be the best way ? -- Misbah <>< M. Warner Losh wrote: > > In message: <18772952.post@talk.nabble.com> > Misbah khan <misbah_khan@engineer.com> writes: > : I am not very clear Why floating point support in the Kernel should be > : avoided ? > > Because saving the FPU state is expensive. The kernel multiplexes the > FPU hardware among all the userland processes that use it. For parts > of the kernel to effectively use the FPU, it would have to save the > state on traps into the kernel, and restore the state when returning > to userland. This is a big drag on performance of the system. There > are ways around this optimization where you save the fpu state > explicitly, but the expense si still there. > > : We want our DSP algorithm to run at the boot time and since kernel > thread > : having higher priority , i assume that it would be faster than user > : application. > > Bad assumption. User threads can get boots in priority in certain > cases. > > If it really is just at boot time, before any other threads are > started, you likely can get away with it. > > : If i really have to speed up my application execution what mechanism > will > : you suggest me to try ? > : > : After using Hardware VFP support also i am still laging the timing > : requirement by 800 ms in my case > > This sounds like a classic case of putting 20 pounds in a 10 pound bag > and complaining that the bag rips out. You need a bigger bag. > > If you are doing FPU intensive operations in userland, moving them to > the kernel isn't going to help anything but maybe latency. And if you > are almost a full second short, your quest to move things into the > kernel is almost certainly not going to help enough. Moving things > into the kernel only helps latency, and only when there's lots of > context switches (since doing stuff in the kernel avoids the domain > crossing that forces the save of the CPU state). > > I don't know if the 800ms timing is relative to a task that must run > once a second, or once an hour. If the former, you're totally > screwed and need to either be more clever about your algorithm > (consider integer math, profiling the hot spots, etc), or you need > more powerful silicon. If you are trying to shave 800ms off a task > that runs for an hour, then you just might be able to do that with > tiny code tweaks. > > Sorry to be so harsh, but really, there's no such thing as a free lunch. > > Warner > > > > : ---- Misbah <>< > : > : > : Laurent Pinchart-4 wrote: > : > > : > On Friday 01 August 2008, Misbah khan wrote: > : >> > : >> Hi all, > : >> > : >> I have a DSP algorithm which i am running in the application even > after > : >> enabling the VFP support it is taking a lot of time to get executed > hence > : >> > : >> I want to transform the same into the driver insted of an user > : >> application. > : >> Can anybody suggest whether doing the same could be a better solution > and > : >> what could be the chalenges that i have to face by implimenting such > : >> floating point support in the driver. > : >> > : >> Is there a way in the application itself to make it execute faster. > : > > : > Floating-point in the kernel should be avoided. FPU state save/restore > : > operations are costly and are not performed by the kernel when > switching > : > from userspace to kernelspace context. You will have to protect > : > floating-point sections with kernel_fpu_begin/kernel_fpu_end which, if > I'm > : > not mistaken, disables preemption. That's probably not something you > want > : > to do. Why would the same code run faster in kernelspace then > userspace ? > : > > : > -- > : > Laurent Pinchart > : > CSE Semaphore Belgium > : > > : > Chaussee de Bruxelles, 732A > : > B-1410 Waterloo > : > Belgium > : > > : > T +32 (2) 387 42 59 > : > F +32 (2) 387 42 75 > : > > : > > : > _______________________________________________ > : > Linuxppc-embedded mailing list > : > Linuxppc-embedded@ozlabs.org > : > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > : > > : > : -- > : View this message in context: > http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18772952.html > : Sent from the linuxppc-embedded mailing list archive at Nabble.com. > : > : _______________________________________________ > : Linuxppc-embedded mailing list > : Linuxppc-embedded@ozlabs.org > : https://ozlabs.org/mailman/listinfo/linuxppc-embedded > : > : > _______________________________________________ > Linuxppc-embedded mailing list > Linuxppc-embedded@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > > -- View this message in context: http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18805820.html Sent from the linuxppc-embedded mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: floating point support in the driver. 2008-08-04 5:23 ` Misbah khan @ 2008-08-04 5:33 ` M. Warner Losh 2008-08-04 5:47 ` David Hawkins 0 siblings, 1 reply; 9+ messages in thread From: M. Warner Losh @ 2008-08-04 5:33 UTC (permalink / raw) To: misbah_khan; +Cc: linuxppc-embedded In message: <18805820.post@talk.nabble.com> Misbah khan <misbah_khan@engineer.com> writes: : Actually the complete algorithm should take not more than 1 sec to execute : but its taking around 1.8 sec .The algorithm would rum between every few : secs. I am trying to fine tune the code ,i just want to know that will it a : good idea to alter the task priority and what could be the best way ? You could try a very high priority task, but I'd suggest that profiling the code to see where the hot spots are might yield better results... Have you identified what other process is running for the those two seconds that's causing your <1s algorithm to take about 2x as long? What's the real vs cpu time say for this algorithm? If they are about the same, then you have to make it faster. Given that you are looking for a factor of 2x, my experience suggests that moving this into the kernel is unlikely to be successful and will be a lot of pain. It would be rare indeed to find a system that context switches really account for that much of the time. To make progress, you need to identify the real root cause for this slowdown. Either your thread really is taking the extra time, in which case profiling and algorithm improvement is your only alternative. Or someone else is eating all the CPU, and you must either hold them off, or get a beefier CPU. Boosting the priority might be a good diagnostic aide, but may have unintended side effects if you really are competing with something else. Wouldn't that starve the other process? What is it doing? Warner : -- Misbah <>< : : M. Warner Losh wrote: : > : > In message: <18772952.post@talk.nabble.com> : > Misbah khan <misbah_khan@engineer.com> writes: : > : I am not very clear Why floating point support in the Kernel should be : > : avoided ? : > : > Because saving the FPU state is expensive. The kernel multiplexes the : > FPU hardware among all the userland processes that use it. For parts : > of the kernel to effectively use the FPU, it would have to save the : > state on traps into the kernel, and restore the state when returning : > to userland. This is a big drag on performance of the system. There : > are ways around this optimization where you save the fpu state : > explicitly, but the expense si still there. : > : > : We want our DSP algorithm to run at the boot time and since kernel : > thread : > : having higher priority , i assume that it would be faster than user : > : application. : > : > Bad assumption. User threads can get boots in priority in certain : > cases. : > : > If it really is just at boot time, before any other threads are : > started, you likely can get away with it. : > : > : If i really have to speed up my application execution what mechanism : > will : > : you suggest me to try ? : > : : > : After using Hardware VFP support also i am still laging the timing : > : requirement by 800 ms in my case : > : > This sounds like a classic case of putting 20 pounds in a 10 pound bag : > and complaining that the bag rips out. You need a bigger bag. : > : > If you are doing FPU intensive operations in userland, moving them to : > the kernel isn't going to help anything but maybe latency. And if you : > are almost a full second short, your quest to move things into the : > kernel is almost certainly not going to help enough. Moving things : > into the kernel only helps latency, and only when there's lots of : > context switches (since doing stuff in the kernel avoids the domain : > crossing that forces the save of the CPU state). : > : > I don't know if the 800ms timing is relative to a task that must run : > once a second, or once an hour. If the former, you're totally : > screwed and need to either be more clever about your algorithm : > (consider integer math, profiling the hot spots, etc), or you need : > more powerful silicon. If you are trying to shave 800ms off a task : > that runs for an hour, then you just might be able to do that with : > tiny code tweaks. : > : > Sorry to be so harsh, but really, there's no such thing as a free lunch. : > : > Warner : > : > : > : > : ---- Misbah <>< : > : : > : : > : Laurent Pinchart-4 wrote: : > : > : > : > On Friday 01 August 2008, Misbah khan wrote: : > : >> : > : >> Hi all, : > : >> : > : >> I have a DSP algorithm which i am running in the application even : > after : > : >> enabling the VFP support it is taking a lot of time to get executed : > hence : > : >> : > : >> I want to transform the same into the driver insted of an user : > : >> application. : > : >> Can anybody suggest whether doing the same could be a better solution : > and : > : >> what could be the chalenges that i have to face by implimenting such : > : >> floating point support in the driver. : > : >> : > : >> Is there a way in the application itself to make it execute faster. : > : > : > : > Floating-point in the kernel should be avoided. FPU state save/restore : > : > operations are costly and are not performed by the kernel when : > switching : > : > from userspace to kernelspace context. You will have to protect : > : > floating-point sections with kernel_fpu_begin/kernel_fpu_end which, if : > I'm : > : > not mistaken, disables preemption. That's probably not something you : > want : > : > to do. Why would the same code run faster in kernelspace then : > userspace ? : > : > : > : > -- : > : > Laurent Pinchart : > : > CSE Semaphore Belgium : > : > : > : > Chaussee de Bruxelles, 732A : > : > B-1410 Waterloo : > : > Belgium : > : > : > : > T +32 (2) 387 42 59 : > : > F +32 (2) 387 42 75 : > : > : > : > : > : > _______________________________________________ : > : > Linuxppc-embedded mailing list : > : > Linuxppc-embedded@ozlabs.org : > : > https://ozlabs.org/mailman/listinfo/linuxppc-embedded : > : > : > : : > : -- : > : View this message in context: : > http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18772952.html : > : Sent from the linuxppc-embedded mailing list archive at Nabble.com. : > : : > : _______________________________________________ : > : Linuxppc-embedded mailing list : > : Linuxppc-embedded@ozlabs.org : > : https://ozlabs.org/mailman/listinfo/linuxppc-embedded : > : : > : : > _______________________________________________ : > Linuxppc-embedded mailing list : > Linuxppc-embedded@ozlabs.org : > https://ozlabs.org/mailman/listinfo/linuxppc-embedded : > : > : : -- : View this message in context: http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18805820.html : Sent from the linuxppc-embedded mailing list archive at Nabble.com. : : _______________________________________________ : Linuxppc-embedded mailing list : Linuxppc-embedded@ozlabs.org : https://ozlabs.org/mailman/listinfo/linuxppc-embedded : : ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: floating point support in the driver. 2008-08-04 5:33 ` M. Warner Losh @ 2008-08-04 5:47 ` David Hawkins 2008-08-05 9:49 ` Misbah khan 0 siblings, 1 reply; 9+ messages in thread From: David Hawkins @ 2008-08-04 5:47 UTC (permalink / raw) To: M. Warner Losh; +Cc: misbah_khan, linuxppc-embedded Hi Misbah, I would recommend you look at your floating-point code again and benchmark each section. You should be able to estimate the number of clock cycles required to complete an operation and then check that against your measurements. Depending on whether your algorithm is processing intensive or data movement intensive, you may find that the big time waster is moving data on or off chip, or perhaps its a large vector operation that is blowing out the cache. If you do find that, then on some processors you can lock the cache, so your algorithm would require a custom driver that steals part of the cache from the OS, but the floating point code would not run in the kernel, it would run on data stored in the stolen cache area. You can lock both instructions and data in the cache; eg. an FFT routine can be locked in the instruction cache, while FFT data is in the data cache. I'm not sure how easy this is to do under Linux though. Here's an example of the level of detail you can get downto when benchmarking code: http://www.ovro.caltech.edu/~dwh/correlator/pdf/dsp_programming.pdf The FFT routine used on this processor made use of both the instruction and data cache (on-chip SRAM) on the DSP. This code is being re-developed to run on a MPC8349EA PowerPC with FPU. I did some initial testing to confirm that the FPU operates as per the data sheet, and will eventually get around to more complete testing. Which processor were you running your code on, and what frequency were you operating the processor at? How does the algorithm timing compare when run on other processors, eg. your desktop or laptop machine? Cheers, Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: floating point support in the driver. 2008-08-04 5:47 ` David Hawkins @ 2008-08-05 9:49 ` Misbah khan 2008-08-05 16:53 ` David Hawkins 0 siblings, 1 reply; 9+ messages in thread From: Misbah khan @ 2008-08-05 9:49 UTC (permalink / raw) To: linuxppc-embedded Hi David , Thank you for your reply. I am running the algorithm on OMAP processor (arm-core) and i did tried the same on iMX processor which takes 1.7 times more than OMAP. It is true that the algorithm is performing the vector operation which is blowing the cache . But the question is How to lock the cache ? In driver how should we implement the same ? An example code or a document could be helpful in this regard. --- Misbah <>< David Hawkins-3 wrote: > > > Hi Misbah, > > I would recommend you look at your floating-point code again > and benchmark each section. You should be able to estimate > the number of clock cycles required to complete an operation > and then check that against your measurements. > > Depending on whether your algorithm is processing intensive > or data movement intensive, you may find that the big time > waster is moving data on or off chip, or perhaps its a large > vector operation that is blowing out the cache. If you > do find that, then on some processors you can lock the > cache, so your algorithm would require a custom driver > that steals part of the cache from the OS, but the floating point > code would not run in the kernel, it would run on data > stored in the stolen cache area. You can lock both instructions > and data in the cache; eg. an FFT routine can be locked in > the instruction cache, while FFT data is in the data cache. > I'm not sure how easy this is to do under Linux though. > > Here's an example of the level of detail you can get > downto when benchmarking code: > > http://www.ovro.caltech.edu/~dwh/correlator/pdf/dsp_programming.pdf > > The FFT routine used on this processor made use of both > the instruction and data cache (on-chip SRAM) on the > DSP. > > This code is being re-developed to run on a MPC8349EA PowerPC > with FPU. I did some initial testing to confirm that the > FPU operates as per the data sheet, and will eventually get > around to more complete testing. > > Which processor were you running your code on, and what > frequency were you operating the processor at? How does > the algorithm timing compare when run on other processors, > eg. your desktop or laptop machine? > > Cheers, > Dave > _______________________________________________ > Linuxppc-embedded mailing list > Linuxppc-embedded@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-embedded > > -- View this message in context: http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18827857.html Sent from the linuxppc-embedded mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: floating point support in the driver. 2008-08-05 9:49 ` Misbah khan @ 2008-08-05 16:53 ` David Hawkins 0 siblings, 0 replies; 9+ messages in thread From: David Hawkins @ 2008-08-05 16:53 UTC (permalink / raw) To: Misbah khan; +Cc: linuxppc-embedded Hi Misbah, > I am running the algorithm on OMAP processor (arm-core) > and i did tried the same on iMX processor which > takes 1.7 times more than OMAP. Ok, thats a 10,000ft benchmark. The observation being that it fails your requirement. How does that time compare to the operations required, and their expected times? > It is true that the algorithm is performing the vector > operation which is blowing the cache. Determined how? Obviously if your cache is 16K and your data is 64K, there's no way it'll fit in there at once, but the algorithm could be crafted such that 1K at a time was processed, while another data packet was moved onto the cache ... but this is very processor specific. > But the question is How to lock the cache ? In driver > how should we implement the same ? > > An example code or a document could be helpful in this regard. Indeed :) I have no idea how the OMAP works, so the following are just random, and possibly incorrect ramblings ... The MPC8349EA startup code uses a trick where it zeros out sections of the cache while providing an address. Once the addresses and zeros are in the cache, its locked. From that point on, memory accesses to those addresses result in cache 'hits'. This is the startup stack used by the U-Boot bootloader. If something similar was done under Linux, then *I guess* you could implement mmap() and ioremap() the section of addresses associated with the locked cache lines. You could then DMA data to and from the cache area, and run your algorithm there. That would provide you 'fast SRAM'. However, you might be able to get the same effect by setting up your processing algorithm such that it handled smaller chunks of data. Feel free to explain your data processing :) Cheers, Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-08-05 17:10 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-01 10:57 floating point support in the driver Misbah khan 2008-08-01 11:32 ` Laurent Pinchart 2008-08-01 12:00 ` Misbah khan 2008-08-01 15:54 ` M. Warner Losh 2008-08-04 5:23 ` Misbah khan 2008-08-04 5:33 ` M. Warner Losh 2008-08-04 5:47 ` David Hawkins 2008-08-05 9:49 ` Misbah khan 2008-08-05 16:53 ` David Hawkins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).