* Image scaling performance @ 2015-02-24 9:39 Michael Zimmermann 2015-02-24 9:51 ` Vladimir 'phcoder' Serbinenko 0 siblings, 1 reply; 24+ messages in thread From: Michael Zimmermann @ 2015-02-24 9:39 UTC (permalink / raw) To: The development of GNU GRUB Any ideas what could slow down the image scaling algorithm? The only reasons I could think of would either be slow memory or some compiler problems. Since my Ram is mapped cachable I don't think the RAM is too slow. I even forces using the Nearest neighbor algorithm already. It speeds things up a lot but it's not as fast as you'd expect. Some technical info: ARMv7 Linaro GCC 4.9 MMU setup is done by the previous bootloader(I disabled GRUB's (uboot) MMU setup - it prooved to be faster) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 9:39 Image scaling performance Michael Zimmermann @ 2015-02-24 9:51 ` Vladimir 'phcoder' Serbinenko 2015-02-24 10:00 ` Michael Zimmermann 0 siblings, 1 reply; 24+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2015-02-24 9:51 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 967 bytes --] Did you try to look at ASM of the function in question? Do you compile to thumb? Multiplication sometimes generates function calls in thumb. Try marking the scaling function as arm explicitly Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a écrit : > Any ideas what could slow down the image scaling algorithm? > The only reasons I could think of would either be slow memory or some > compiler problems. Since my Ram is mapped cachable I don't think the > RAM is too slow. > > I even forces using the Nearest neighbor algorithm already. It speeds > things up a lot but it's not as fast as you'd expect. > > Some technical info: > ARMv7 > Linaro GCC 4.9 > MMU setup is done by the previous bootloader(I disabled GRUB's (uboot) > MMU setup - it prooved to be faster) > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > [-- Attachment #2: Type: text/html, Size: 1382 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 9:51 ` Vladimir 'phcoder' Serbinenko @ 2015-02-24 10:00 ` Michael Zimmermann 2015-02-24 11:27 ` Vladimir 'phcoder' Serbinenko 2015-02-25 15:45 ` Leif Lindholm 0 siblings, 2 replies; 24+ messages in thread From: Michael Zimmermann @ 2015-02-24 10:00 UTC (permalink / raw) To: The development of GNU GRUB the function seems to use __aeabi_uidiv. I'm not sure if this is a sw or hw implementation. Full code: ASM: http://pastebin.com/FnPRZt1H pseudo-C: http://pastebin.com/dH3YBk46 On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko <phcoder@gmail.com> wrote: > Did you try to look at ASM of the function in question? Do you compile to > thumb? Multiplication sometimes generates function calls in thumb. Try > marking the scaling function as arm explicitly > > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a écrit > : >> >> Any ideas what could slow down the image scaling algorithm? >> The only reasons I could think of would either be slow memory or some >> compiler problems. Since my Ram is mapped cachable I don't think the >> RAM is too slow. >> >> I even forces using the Nearest neighbor algorithm already. It speeds >> things up a lot but it's not as fast as you'd expect. >> >> Some technical info: >> ARMv7 >> Linaro GCC 4.9 >> MMU setup is done by the previous bootloader(I disabled GRUB's (uboot) >> MMU setup - it prooved to be faster) >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 10:00 ` Michael Zimmermann @ 2015-02-24 11:27 ` Vladimir 'phcoder' Serbinenko 2015-02-24 11:47 ` Michael Zimmermann 2015-02-25 15:45 ` Leif Lindholm 1 sibling, 1 reply; 24+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2015-02-24 11:27 UTC (permalink / raw) To: The development of GNU GRUB [-- Attachment #1.1: Type: text/plain, Size: 1850 bytes --] Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann < sigmaepsilon92@gmail.com> a écrit : > the function seems to use __aeabi_uidiv. I'm not sure if this is a sw > or hw implementation. > software. Try attached patch > Full code: > ASM: http://pastebin.com/FnPRZt1H > pseudo-C <http://pastebin.com/FnPRZt1Hpseudo-C>: > http://pastebin.com/dH3YBk46 > > On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko > <phcoder@gmail.com> wrote: > > Did you try to look at ASM of the function in question? Do you compile to > > thumb? Multiplication sometimes generates function calls in thumb. Try > > marking the scaling function as arm explicitly > > > > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a > écrit > > : > >> > >> Any ideas what could slow down the image scaling algorithm? > >> The only reasons I could think of would either be slow memory or some > >> compiler problems. Since my Ram is mapped cachable I don't think the > >> RAM is too slow. > >> > >> I even forces using the Nearest neighbor algorithm already. It speeds > >> things up a lot but it's not as fast as you'd expect. > >> > >> Some technical info: > >> ARMv7 > >> Linaro GCC 4.9 > >> MMU setup is done by the previous bootloader(I disabled GRUB's (uboot) > >> MMU setup - it prooved to be faster) > >> > >> _______________________________________________ > >> Grub-devel mailing list > >> Grub-devel@gnu.org > >> https://lists.gnu.org/mailman/listinfo/grub-devel > > > > > > _______________________________________________ > > Grub-devel mailing list > > Grub-devel@gnu.org > > https://lists.gnu.org/mailman/listinfo/grub-devel > > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > [-- Attachment #1.2: Type: text/html, Size: 3190 bytes --] [-- Attachment #2: scale.diff --] [-- Type: text/x-patch, Size: 1420 bytes --] diff --git a/grub-core/video/bitmap_scale.c b/grub-core/video/bitmap_scale.c index 0b93d02..1c7195f 100644 --- a/grub-core/video/bitmap_scale.c +++ b/grub-core/video/bitmap_scale.c @@ -366,22 +366,32 @@ scale_nn (struct grub_video_bitmap *dst, struct grub_video_bitmap *src) /* bytes_per_pixel is the same for both src and dst. */ unsigned bytes_per_pixel = dst->mode_info.bytes_per_pixel; - unsigned dy; - for (dy = 0; dy < dh; dy++) + unsigned dy, sy, ystep, yfrac, yover; + unsigned dx, sx, xstep, xfrac, xover; + ystep = sw / dw; + yover = sw % dw; + xstep = sh / dh; + xover = sh % dh; + + for (dy = 0, sy = 0; dy < dh; dy++, sy += ystep, yfrac += yover) { unsigned dx; - for (dx = 0; dx < dw; dx++) + if (yfrac > dw) + { + yfrac -= dw; + sy++; + } + for (dx = 0, sx = 0; dx < dw; dx++, sx += xstep, xfrac += xover) { grub_uint8_t *dptr; grub_uint8_t *sptr; - unsigned sx; - unsigned sy; unsigned comp; - /* Compute the source coordinate that the destination coordinate - maps to. Note: sx/sw = dx/dw => sx = sw*dx/dw. */ - sx = sw * dx / dw; - sy = sh * dy / dh; + if (xfrac > dh) + { + xfrac -= dh; + sx++; + } /* Get the address of the pixels in src and dst. */ dptr = ddata + dy * dstride + dx * bytes_per_pixel; ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 11:27 ` Vladimir 'phcoder' Serbinenko @ 2015-02-24 11:47 ` Michael Zimmermann 2015-02-24 12:39 ` Vladimir 'phcoder' Serbinenko 0 siblings, 1 reply; 24+ messages in thread From: Michael Zimmermann @ 2015-02-24 11:47 UTC (permalink / raw) To: The development of GNU GRUB thx I'll try that but wouldn't it make more sense to implement a hw version of this function?(do I need a different compiler?) Almost all modules use this call and I guess it could really improve the performance. On Tue, Feb 24, 2015 at 12:27 PM, Vladimir 'phcoder' Serbinenko <phcoder@gmail.com> wrote: > > > Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann > <sigmaepsilon92@gmail.com> a écrit : >> >> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw >> or hw implementation. > > software. Try attached patch >> >> Full code: >> ASM: http://pastebin.com/FnPRZt1H >> pseudo-C: http://pastebin.com/dH3YBk46 >> >> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko >> <phcoder@gmail.com> wrote: >> > Did you try to look at ASM of the function in question? Do you compile >> > to >> > thumb? Multiplication sometimes generates function calls in thumb. Try >> > marking the scaling function as arm explicitly >> > >> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a >> > écrit >> > : >> >> >> >> Any ideas what could slow down the image scaling algorithm? >> >> The only reasons I could think of would either be slow memory or some >> >> compiler problems. Since my Ram is mapped cachable I don't think the >> >> RAM is too slow. >> >> >> >> I even forces using the Nearest neighbor algorithm already. It speeds >> >> things up a lot but it's not as fast as you'd expect. >> >> >> >> Some technical info: >> >> ARMv7 >> >> Linaro GCC 4.9 >> >> MMU setup is done by the previous bootloader(I disabled GRUB's (uboot) >> >> MMU setup - it prooved to be faster) >> >> >> >> _______________________________________________ >> >> Grub-devel mailing list >> >> Grub-devel@gnu.org >> >> https://lists.gnu.org/mailman/listinfo/grub-devel >> > >> > >> > _______________________________________________ >> > Grub-devel mailing list >> > Grub-devel@gnu.org >> > https://lists.gnu.org/mailman/listinfo/grub-devel >> > >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 11:47 ` Michael Zimmermann @ 2015-02-24 12:39 ` Vladimir 'phcoder' Serbinenko 2015-02-24 18:01 ` Michael Zimmermann 2015-02-25 16:20 ` Leif Lindholm 0 siblings, 2 replies; 24+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2015-02-24 12:39 UTC (permalink / raw) To: The development of GNU GRUB [-- Attachment #1: Type: text/plain, Size: 3026 bytes --] Le Tue Feb 24 2015 at 12:48:10 PM, Michael Zimmermann < sigmaepsilon92@gmail.com> a écrit : > thx I'll try that but wouldn't it make more sense to implement a hw > version of this function?(do I need a different compiler?) > > AFAIK there isn't a consistent division instruction across all ARMs and which is enabled on boot time. You can implement hw version of division but it will crash on some machines. Division is a slow operation on any platform and should be avoided as far as possible. > Almost all modules use this call and I guess it could really improve > the performance. > > On Tue, Feb 24, 2015 at 12:27 PM, Vladimir 'phcoder' Serbinenko > <phcoder@gmail.com> wrote: > > > > > > Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann > > <sigmaepsilon92@gmail.com> a écrit : > >> > >> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw > >> or hw implementation. > > > > software. Try attached patch > >> > >> Full code: > >> ASM: http://pastebin.com/FnPRZt1H > >> pseudo-C: http://pastebin.com/dH3YBk46 > >> > >> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko > >> <phcoder@gmail.com> wrote: > >> > Did you try to look at ASM of the function in question? Do you compile > >> > to > >> > thumb? Multiplication sometimes generates function calls in thumb. Try > >> > marking the scaling function as arm explicitly > >> > > >> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> > a > >> > écrit > >> > : > >> >> > >> >> Any ideas what could slow down the image scaling algorithm? > >> >> The only reasons I could think of would either be slow memory or some > >> >> compiler problems. Since my Ram is mapped cachable I don't think the > >> >> RAM is too slow. > >> >> > >> >> I even forces using the Nearest neighbor algorithm already. It speeds > >> >> things up a lot but it's not as fast as you'd expect. > >> >> > >> >> Some technical info: > >> >> ARMv7 > >> >> Linaro GCC 4.9 > >> >> MMU setup is done by the previous bootloader(I disabled GRUB's > (uboot) > >> >> MMU setup - it prooved to be faster) > >> >> > >> >> _______________________________________________ > >> >> Grub-devel mailing list > >> >> Grub-devel@gnu.org > >> >> https://lists.gnu.org/mailman/listinfo/grub-devel > >> > > >> > > >> > _______________________________________________ > >> > Grub-devel mailing list > >> > Grub-devel@gnu.org > >> > https://lists.gnu.org/mailman/listinfo/grub-devel > >> > > >> > >> _______________________________________________ > >> Grub-devel mailing list > >> Grub-devel@gnu.org > >> https://lists.gnu.org/mailman/listinfo/grub-devel > > > > > > _______________________________________________ > > Grub-devel mailing list > > Grub-devel@gnu.org > > https://lists.gnu.org/mailman/listinfo/grub-devel > > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > [-- Attachment #2: Type: text/html, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 12:39 ` Vladimir 'phcoder' Serbinenko @ 2015-02-24 18:01 ` Michael Zimmermann 2015-02-24 18:22 ` Andrei Borzenkov 2015-02-25 16:20 ` Leif Lindholm 1 sibling, 1 reply; 24+ messages in thread From: Michael Zimmermann @ 2015-02-24 18:01 UTC (permalink / raw) To: The development of GNU GRUB what do u mean with "which is enabled on boot time."? what do linux kernel and userspace applications use? On Tue, Feb 24, 2015 at 1:39 PM, Vladimir 'phcoder' Serbinenko <phcoder@gmail.com> wrote: > > > Le Tue Feb 24 2015 at 12:48:10 PM, Michael Zimmermann > <sigmaepsilon92@gmail.com> a écrit : >> >> thx I'll try that but wouldn't it make more sense to implement a hw >> version of this function?(do I need a different compiler?) >> > AFAIK there isn't a consistent division instruction across all ARMs and > which is enabled on boot time. > You can implement hw version of division but it will crash on some machines. > Division is a slow operation on any platform and should be avoided as far as > possible. >> >> Almost all modules use this call and I guess it could really improve >> the performance. >> >> On Tue, Feb 24, 2015 at 12:27 PM, Vladimir 'phcoder' Serbinenko >> <phcoder@gmail.com> wrote: >> > >> > >> > Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann >> > <sigmaepsilon92@gmail.com> a écrit : >> >> >> >> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw >> >> or hw implementation. >> > >> > software. Try attached patch >> >> >> >> Full code: >> >> ASM: http://pastebin.com/FnPRZt1H >> >> pseudo-C: http://pastebin.com/dH3YBk46 >> >> >> >> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko >> >> <phcoder@gmail.com> wrote: >> >> > Did you try to look at ASM of the function in question? Do you >> >> > compile >> >> > to >> >> > thumb? Multiplication sometimes generates function calls in thumb. >> >> > Try >> >> > marking the scaling function as arm explicitly >> >> > >> >> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> >> >> > a >> >> > écrit >> >> > : >> >> >> >> >> >> Any ideas what could slow down the image scaling algorithm? >> >> >> The only reasons I could think of would either be slow memory or >> >> >> some >> >> >> compiler problems. Since my Ram is mapped cachable I don't think the >> >> >> RAM is too slow. >> >> >> >> >> >> I even forces using the Nearest neighbor algorithm already. It >> >> >> speeds >> >> >> things up a lot but it's not as fast as you'd expect. >> >> >> >> >> >> Some technical info: >> >> >> ARMv7 >> >> >> Linaro GCC 4.9 >> >> >> MMU setup is done by the previous bootloader(I disabled GRUB's >> >> >> (uboot) >> >> >> MMU setup - it prooved to be faster) >> >> >> >> >> >> _______________________________________________ >> >> >> Grub-devel mailing list >> >> >> Grub-devel@gnu.org >> >> >> https://lists.gnu.org/mailman/listinfo/grub-devel >> >> > >> >> > >> >> > _______________________________________________ >> >> > Grub-devel mailing list >> >> > Grub-devel@gnu.org >> >> > https://lists.gnu.org/mailman/listinfo/grub-devel >> >> > >> >> >> >> _______________________________________________ >> >> Grub-devel mailing list >> >> Grub-devel@gnu.org >> >> https://lists.gnu.org/mailman/listinfo/grub-devel >> > >> > >> > _______________________________________________ >> > Grub-devel mailing list >> > Grub-devel@gnu.org >> > https://lists.gnu.org/mailman/listinfo/grub-devel >> > >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 18:01 ` Michael Zimmermann @ 2015-02-24 18:22 ` Andrei Borzenkov 0 siblings, 0 replies; 24+ messages in thread From: Andrei Borzenkov @ 2015-02-24 18:22 UTC (permalink / raw) To: Michael Zimmermann; +Cc: The development of GNU GRUB В Tue, 24 Feb 2015 19:01:03 +0100 Michael Zimmermann <sigmaepsilon92@gmail.com> пишет: > what do u mean with "which is enabled on boot time."? > what do linux kernel and userspace applications use? > Software implementation provided either by libgcc or explicitly defined like grub does it (e.g. see arch/arm/lib/lib1funcs.S in linux source tree). gcc generates call to them in both cases. > On Tue, Feb 24, 2015 at 1:39 PM, Vladimir 'phcoder' Serbinenko > <phcoder@gmail.com> wrote: > > > > > > Le Tue Feb 24 2015 at 12:48:10 PM, Michael Zimmermann > > <sigmaepsilon92@gmail.com> a écrit : > >> > >> thx I'll try that but wouldn't it make more sense to implement a hw > >> version of this function?(do I need a different compiler?) > >> > > AFAIK there isn't a consistent division instruction across all ARMs and > > which is enabled on boot time. > > You can implement hw version of division but it will crash on some machines. > > Division is a slow operation on any platform and should be avoided as far as > > possible. > >> > >> Almost all modules use this call and I guess it could really improve > >> the performance. > >> > >> On Tue, Feb 24, 2015 at 12:27 PM, Vladimir 'phcoder' Serbinenko > >> <phcoder@gmail.com> wrote: > >> > > >> > > >> > Le Tue Feb 24 2015 at 11:01:03 AM, Michael Zimmermann > >> > <sigmaepsilon92@gmail.com> a écrit : > >> >> > >> >> the function seems to use __aeabi_uidiv. I'm not sure if this is a sw > >> >> or hw implementation. > >> > > >> > software. Try attached patch > >> >> > >> >> Full code: > >> >> ASM: http://pastebin.com/FnPRZt1H > >> >> pseudo-C: http://pastebin.com/dH3YBk46 > >> >> > >> >> On Tue, Feb 24, 2015 at 10:51 AM, Vladimir 'phcoder' Serbinenko > >> >> <phcoder@gmail.com> wrote: > >> >> > Did you try to look at ASM of the function in question? Do you > >> >> > compile > >> >> > to > >> >> > thumb? Multiplication sometimes generates function calls in thumb. > >> >> > Try > >> >> > marking the scaling function as arm explicitly > >> >> > > >> >> > Le 2015-02-24 10:39, "Michael Zimmermann" <sigmaepsilon92@gmail.com> > >> >> > a > >> >> > écrit > >> >> > : > >> >> >> > >> >> >> Any ideas what could slow down the image scaling algorithm? > >> >> >> The only reasons I could think of would either be slow memory or > >> >> >> some > >> >> >> compiler problems. Since my Ram is mapped cachable I don't think the > >> >> >> RAM is too slow. > >> >> >> > >> >> >> I even forces using the Nearest neighbor algorithm already. It > >> >> >> speeds > >> >> >> things up a lot but it's not as fast as you'd expect. > >> >> >> > >> >> >> Some technical info: > >> >> >> ARMv7 > >> >> >> Linaro GCC 4.9 > >> >> >> MMU setup is done by the previous bootloader(I disabled GRUB's > >> >> >> (uboot) > >> >> >> MMU setup - it prooved to be faster) > >> >> >> > >> >> >> _______________________________________________ > >> >> >> Grub-devel mailing list > >> >> >> Grub-devel@gnu.org > >> >> >> https://lists.gnu.org/mailman/listinfo/grub-devel > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > Grub-devel mailing list > >> >> > Grub-devel@gnu.org > >> >> > https://lists.gnu.org/mailman/listinfo/grub-devel > >> >> > > >> >> > >> >> _______________________________________________ > >> >> Grub-devel mailing list > >> >> Grub-devel@gnu.org > >> >> https://lists.gnu.org/mailman/listinfo/grub-devel > >> > > >> > > >> > _______________________________________________ > >> > Grub-devel mailing list > >> > Grub-devel@gnu.org > >> > https://lists.gnu.org/mailman/listinfo/grub-devel > >> > > >> > >> _______________________________________________ > >> Grub-devel mailing list > >> Grub-devel@gnu.org > >> https://lists.gnu.org/mailman/listinfo/grub-devel > > > > > > _______________________________________________ > > Grub-devel mailing list > > Grub-devel@gnu.org > > https://lists.gnu.org/mailman/listinfo/grub-devel > > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 12:39 ` Vladimir 'phcoder' Serbinenko 2015-02-24 18:01 ` Michael Zimmermann @ 2015-02-25 16:20 ` Leif Lindholm 1 sibling, 0 replies; 24+ messages in thread From: Leif Lindholm @ 2015-02-25 16:20 UTC (permalink / raw) To: The development of GNU GRUB On Tue, Feb 24, 2015 at 12:39:31PM +0000, Vladimir 'phcoder' Serbinenko wrote: > > thx I'll try that but wouldn't it make more sense to implement a hw > > version of this function?(do I need a different compiler?) > > > AFAIK there isn't a consistent division instruction across all ARMs and > which is enabled on boot time. > You can implement hw version of division but it will crash on some machines. > Division is a slow operation on any platform and should be avoided as far > as possible. For 32-bit ARM, only the later processors (Cortex-A7, -A12, -A15, -A17) have SDIV/UDIV. / Leif ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-24 10:00 ` Michael Zimmermann 2015-02-24 11:27 ` Vladimir 'phcoder' Serbinenko @ 2015-02-25 15:45 ` Leif Lindholm 2015-02-25 16:23 ` Leif Lindholm 1 sibling, 1 reply; 24+ messages in thread From: Leif Lindholm @ 2015-02-25 15:45 UTC (permalink / raw) To: The development of GNU GRUB On Tue, Feb 24, 2015 at 11:00:41AM +0100, Michael Zimmermann wrote: > the function seems to use __aeabi_uidiv. I'm not sure if this is a sw > or hw implementation. > Full code: > ASM: http://pastebin.com/FnPRZt1H > pseudo-C: http://pastebin.com/dH3YBk46 > >> Some technical info: > >> ARMv7 > >> Linaro GCC 4.9 I don't see any calls to any of the __aeabi helpers generated for this file with current head. Which specific Linaro toolchain are you using? (mine is"Linaro GCC 4.9-2014.09"). Also, scale_nn gets inlined into grub_video_bitmap_scale for me. (Just trying to understand what is causing the difference.) / Leif ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 15:45 ` Leif Lindholm @ 2015-02-25 16:23 ` Leif Lindholm 2015-02-25 18:38 ` Michael Zimmermann 0 siblings, 1 reply; 24+ messages in thread From: Leif Lindholm @ 2015-02-25 16:23 UTC (permalink / raw) To: The development of GNU GRUB On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote: > > >> Some technical info: > > >> ARMv7 > > >> Linaro GCC 4.9 > > I don't see any calls to any of the __aeabi helpers generated for this > file with current head. Which specific Linaro toolchain are you using? > (mine is"Linaro GCC 4.9-2014.09"). Scratch that, I do see them. Just failing to drive the tools properly. / Leif ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 16:23 ` Leif Lindholm @ 2015-02-25 18:38 ` Michael Zimmermann 2015-02-25 18:41 ` Vladimir 'phcoder' Serbinenko 0 siblings, 1 reply; 24+ messages in thread From: Michael Zimmermann @ 2015-02-25 18:38 UTC (permalink / raw) To: The development of GNU GRUB Why u think the native div code would crash on most devices? I support ARMv7+ only anyway. On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm <leif.lindholm@linaro.org> wrote: > On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote: >> > >> Some technical info: >> > >> ARMv7 >> > >> Linaro GCC 4.9 >> >> I don't see any calls to any of the __aeabi helpers generated for this >> file with current head. Which specific Linaro toolchain are you using? >> (mine is"Linaro GCC 4.9-2014.09"). > > Scratch that, I do see them. Just failing to drive the tools properly. > > / > Leif > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 18:38 ` Michael Zimmermann @ 2015-02-25 18:41 ` Vladimir 'phcoder' Serbinenko 2015-02-25 18:46 ` Michael Zimmermann 0 siblings, 1 reply; 24+ messages in thread From: Vladimir 'phcoder' Serbinenko @ 2015-02-25 18:41 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 1218 bytes --] ARMv7 doesn't mandate div instructions. It's a separate flag in features. GRUB supports earlier CPUs as well and we use them for testing. My only test machine is armv6 Le 2015-02-25 19:38, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a écrit : > Why u think the native div code would crash on most devices? I support > ARMv7+ only anyway. > > On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm <leif.lindholm@linaro.org> > wrote: > > On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote: > >> > >> Some technical info: > >> > >> ARMv7 > >> > >> Linaro GCC 4.9 > >> > >> I don't see any calls to any of the __aeabi helpers generated for this > >> file with current head. Which specific Linaro toolchain are you using? > >> (mine is"Linaro GCC 4.9-2014.09"). > > > > Scratch that, I do see them. Just failing to drive the tools properly. > > > > / > > Leif > > > > _______________________________________________ > > Grub-devel mailing list > > Grub-devel@gnu.org > > https://lists.gnu.org/mailman/listinfo/grub-devel > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > [-- Attachment #2: Type: text/html, Size: 1930 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 18:41 ` Vladimir 'phcoder' Serbinenko @ 2015-02-25 18:46 ` Michael Zimmermann 2015-02-25 18:56 ` Vladimir 'φ-coder/phcoder' Serbinenko ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Michael Zimmermann @ 2015-02-25 18:46 UTC (permalink / raw) To: The development of GNU GRUB oh ok so linux's div/mod/... assembler is as slow/fast as grub's code? Linux uses armv5>= ifdefs. Maybe we could optimized things a little :) About scale_nn, amarullz(https://plus.google.com/u/0/+AhmadAmarullah/about) wrote a optimized version without divs: loops: http://pastebin.com/MaZqWSA9 memcpy: http://pastebin.com/iNq0V5Tw this code works a little faster. I'm still questioning the efficiency math operations because on slow devices there are other bottlenecks of the same kind(like de/compression). On Wed, Feb 25, 2015 at 7:41 PM, Vladimir 'phcoder' Serbinenko <phcoder@gmail.com> wrote: > ARMv7 doesn't mandate div instructions. It's a separate flag in features. > GRUB supports earlier CPUs as well and we use them for testing. My only test > machine is armv6 > > Le 2015-02-25 19:38, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a écrit > : > >> Why u think the native div code would crash on most devices? I support >> ARMv7+ only anyway. >> >> On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm <leif.lindholm@linaro.org> >> wrote: >> > On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote: >> >> > >> Some technical info: >> >> > >> ARMv7 >> >> > >> Linaro GCC 4.9 >> >> >> >> I don't see any calls to any of the __aeabi helpers generated for this >> >> file with current head. Which specific Linaro toolchain are you using? >> >> (mine is"Linaro GCC 4.9-2014.09"). >> > >> > Scratch that, I do see them. Just failing to drive the tools properly. >> > >> > / >> > Leif >> > >> > _______________________________________________ >> > Grub-devel mailing list >> > Grub-devel@gnu.org >> > https://lists.gnu.org/mailman/listinfo/grub-devel >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 18:46 ` Michael Zimmermann @ 2015-02-25 18:56 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-25 19:28 ` Michael Zimmermann 2015-02-25 20:48 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-26 16:44 ` Vladimir 'φ-coder/phcoder' Serbinenko 2 siblings, 1 reply; 24+ messages in thread From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 18:56 UTC (permalink / raw) To: grub-devel [-- Attachment #1: Type: text/plain, Size: 2390 bytes --] On 25.02.2015 19:46, Michael Zimmermann wrote: > oh ok so linux's div/mod/... assembler is as slow/fast as grub's code? > Linux uses armv5>= ifdefs. Maybe we could optimized things a little :) > About scale_nn, > amarullz(https://plus.google.com/u/0/+AhmadAmarullah/about) wrote a > optimized version without divs: > loops: http://pastebin.com/MaZqWSA9 > memcpy: http://pastebin.com/iNq0V5Tw > Please try my patch (reattached here after minor fixes). The patch by anonymous source, sent by third-party through pastebin isn't acceptable from legal perspective > this code works a little faster. I'm still questioning the efficiency > math operations because on slow devices there are other bottlenecks of > the same kind(like de/compression). > > On Wed, Feb 25, 2015 at 7:41 PM, Vladimir 'phcoder' Serbinenko > <phcoder@gmail.com> wrote: >> ARMv7 doesn't mandate div instructions. It's a separate flag in features. >> GRUB supports earlier CPUs as well and we use them for testing. My only test >> machine is armv6 >> >> Le 2015-02-25 19:38, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a écrit >> : >> >>> Why u think the native div code would crash on most devices? I support >>> ARMv7+ only anyway. >>> >>> On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm <leif.lindholm@linaro.org> >>> wrote: >>>> On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote: >>>>>>>> Some technical info: >>>>>>>> ARMv7 >>>>>>>> Linaro GCC 4.9 >>>>> >>>>> I don't see any calls to any of the __aeabi helpers generated for this >>>>> file with current head. Which specific Linaro toolchain are you using? >>>>> (mine is"Linaro GCC 4.9-2014.09"). >>>> >>>> Scratch that, I do see them. Just failing to drive the tools properly. >>>> >>>> / >>>> Leif >>>> >>>> _______________________________________________ >>>> Grub-devel mailing list >>>> Grub-devel@gnu.org >>>> https://lists.gnu.org/mailman/listinfo/grub-devel >>> >>> _______________________________________________ >>> Grub-devel mailing list >>> Grub-devel@gnu.org >>> https://lists.gnu.org/mailman/listinfo/grub-devel >> >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel >> > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > [-- Attachment #2: scale.diff --] [-- Type: text/x-diff, Size: 1442 bytes --] diff --git a/grub-core/video/bitmap_scale.c b/grub-core/video/bitmap_scale.c index 0b93d02..64bacbf 100644 --- a/grub-core/video/bitmap_scale.c +++ b/grub-core/video/bitmap_scale.c @@ -366,22 +366,31 @@ scale_nn (struct grub_video_bitmap *dst, struct grub_video_bitmap *src) /* bytes_per_pixel is the same for both src and dst. */ unsigned bytes_per_pixel = dst->mode_info.bytes_per_pixel; - unsigned dy; - for (dy = 0; dy < dh; dy++) + unsigned dy, sy, ystep, yfrac, yover; + unsigned dx, sx, xstep, xfrac, xover; + ystep = sw / dw; + yover = sw % dw; + xstep = sh / dh; + xover = sh % dh; + + for (dy = 0, sy = 0, yfrac = 0; dy < dh; dy++, sy += ystep, yfrac += yover) { - unsigned dx; - for (dx = 0; dx < dw; dx++) + if (yfrac > dw) + { + yfrac -= dw; + sy++; + } + for (dx = 0, sx = 0, xfrac = 0; dx < dw; dx++, sx += xstep, xfrac += xover) { grub_uint8_t *dptr; grub_uint8_t *sptr; - unsigned sx; - unsigned sy; unsigned comp; - /* Compute the source coordinate that the destination coordinate - maps to. Note: sx/sw = dx/dw => sx = sw*dx/dw. */ - sx = sw * dx / dw; - sy = sh * dy / dh; + if (xfrac > dh) + { + xfrac -= dh; + sx++; + } /* Get the address of the pixels in src and dst. */ dptr = ddata + dy * dstride + dx * bytes_per_pixel; ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 18:56 ` Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 19:28 ` Michael Zimmermann 2015-02-25 20:39 ` Vladimir 'φ-coder/phcoder' Serbinenko 0 siblings, 1 reply; 24+ messages in thread From: Michael Zimmermann @ 2015-02-25 19:28 UTC (permalink / raw) To: The development of GNU GRUB your patch still has graphical glitches: http://puu.sh/gcpco/da369f26c7.png btw it should be legal because modified GPL code still is GPL code. On Wed, Feb 25, 2015 at 7:56 PM, Vladimir 'φ-coder/phcoder' Serbinenko <phcoder@gmail.com> wrote: > On 25.02.2015 19:46, Michael Zimmermann wrote: >> >> oh ok so linux's div/mod/... assembler is as slow/fast as grub's code? >> Linux uses armv5>= ifdefs. Maybe we could optimized things a little :) >> About scale_nn, >> amarullz(https://plus.google.com/u/0/+AhmadAmarullah/about) wrote a >> optimized version without divs: >> loops: http://pastebin.com/MaZqWSA9 >> memcpy: http://pastebin.com/iNq0V5Tw >> > Please try my patch (reattached here after minor fixes). The patch by > anonymous source, sent by third-party through pastebin isn't acceptable from > legal perspective > >> this code works a little faster. I'm still questioning the efficiency >> math operations because on slow devices there are other bottlenecks of >> the same kind(like de/compression). >> >> On Wed, Feb 25, 2015 at 7:41 PM, Vladimir 'phcoder' Serbinenko >> <phcoder@gmail.com> wrote: >>> >>> ARMv7 doesn't mandate div instructions. It's a separate flag in features. >>> GRUB supports earlier CPUs as well and we use them for testing. My only >>> test >>> machine is armv6 >>> >>> Le 2015-02-25 19:38, "Michael Zimmermann" <sigmaepsilon92@gmail.com> a >>> écrit >>> : >>> >>>> Why u think the native div code would crash on most devices? I support >>>> ARMv7+ only anyway. >>>> >>>> On Wed, Feb 25, 2015 at 5:23 PM, Leif Lindholm >>>> <leif.lindholm@linaro.org> >>>> wrote: >>>>> >>>>> On Wed, Feb 25, 2015 at 03:45:40PM +0000, Leif Lindholm wrote: >>>>>>>>> >>>>>>>>> Some technical info: >>>>>>>>> ARMv7 >>>>>>>>> Linaro GCC 4.9 >>>>>> >>>>>> >>>>>> I don't see any calls to any of the __aeabi helpers generated for this >>>>>> file with current head. Which specific Linaro toolchain are you using? >>>>>> (mine is"Linaro GCC 4.9-2014.09"). >>>>> >>>>> >>>>> Scratch that, I do see them. Just failing to drive the tools properly. >>>>> >>>>> / >>>>> Leif >>>>> >>>>> _______________________________________________ >>>>> Grub-devel mailing list >>>>> Grub-devel@gnu.org >>>>> https://lists.gnu.org/mailman/listinfo/grub-devel >>>> >>>> >>>> _______________________________________________ >>>> Grub-devel mailing list >>>> Grub-devel@gnu.org >>>> https://lists.gnu.org/mailman/listinfo/grub-devel >>> >>> >>> >>> _______________________________________________ >>> Grub-devel mailing list >>> Grub-devel@gnu.org >>> https://lists.gnu.org/mailman/listinfo/grub-devel >>> >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel >> > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 19:28 ` Michael Zimmermann @ 2015-02-25 20:39 ` Vladimir 'φ-coder/phcoder' Serbinenko 0 siblings, 0 replies; 24+ messages in thread From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 20:39 UTC (permalink / raw) To: The development of GNU GRUB [-- Attachment #1: Type: text/plain, Size: 127 bytes --] On 25.02.2015 20:28, Michael Zimmermann wrote: > your patch still has graphical glitches: http://puu.sh/gcpco/da369f26c7.png [-- Attachment #2: scale.diff --] [-- Type: text/x-diff, Size: 6115 bytes --] diff --git a/grub-core/video/bitmap_scale.c b/grub-core/video/bitmap_scale.c index 0b93d02..70c32f0 100644 --- a/grub-core/video/bitmap_scale.c +++ b/grub-core/video/bitmap_scale.c @@ -361,35 +361,46 @@ scale_nn (struct grub_video_bitmap *dst, struct grub_video_bitmap *src) unsigned dh = dst->mode_info.height; unsigned sw = src->mode_info.width; unsigned sh = src->mode_info.height; - unsigned dstride = dst->mode_info.pitch; - unsigned sstride = src->mode_info.pitch; + int dstride = dst->mode_info.pitch; + int sstride = src->mode_info.pitch; /* bytes_per_pixel is the same for both src and dst. */ - unsigned bytes_per_pixel = dst->mode_info.bytes_per_pixel; + int bytes_per_pixel = dst->mode_info.bytes_per_pixel; + unsigned dy, sy, ystep, yfrac, yover; + unsigned sx, xstep, xfrac, xover; + grub_uint8_t *dptr, *dline_end, *sline; - unsigned dy; - for (dy = 0; dy < dh; dy++) + xstep = sw / dw; + xover = sw % dw; + ystep = sh / dh; + yover = sh % dh; + + for (dy = 0, sy = 0, yfrac = 0; dy < dh; dy++, sy += ystep, yfrac += yover) { - unsigned dx; - for (dx = 0; dx < dw; dx++) + if (yfrac >= dh) + { + yfrac -= dh; + sy++; + } + dptr = ddata + dy * dstride; + dline_end = dptr + dw * bytes_per_pixel; + sline = sdata + sy * sstride; + for (sx = 0, xfrac = 0; dptr < dline_end; sx += xstep, xfrac += xover, dptr += bytes_per_pixel) { - grub_uint8_t *dptr; grub_uint8_t *sptr; - unsigned sx; - unsigned sy; - unsigned comp; + int comp; - /* Compute the source coordinate that the destination coordinate - maps to. Note: sx/sw = dx/dw => sx = sw*dx/dw. */ - sx = sw * dx / dw; - sy = sh * dy / dh; + if (xfrac >= dw) + { + xfrac -= dw; + sx++; + } /* Get the address of the pixels in src and dst. */ - dptr = ddata + dy * dstride + dx * bytes_per_pixel; - sptr = sdata + sy * sstride + sx * bytes_per_pixel; + sptr = sline + sx * bytes_per_pixel; - /* Copy the pixel color value. */ - for (comp = 0; comp < bytes_per_pixel; comp++) - dptr[comp] = sptr[comp]; + /* Copy the pixel color value. */ + for (comp = 0; comp < bytes_per_pixel; comp++) + dptr[comp] = sptr[comp]; } } return GRUB_ERR_NONE; @@ -422,27 +433,40 @@ scale_bilinear (struct grub_video_bitmap *dst, struct grub_video_bitmap *src) int sstride = src->mode_info.pitch; /* bytes_per_pixel is the same for both src and dst. */ int bytes_per_pixel = dst->mode_info.bytes_per_pixel; + unsigned dy, syf, sy, ystep, yfrac, yover; + unsigned sxf, sx, xstep, xfrac, xover; + grub_uint8_t *dptr, *dline_end, *sline; + + xstep = (sw << 8) / dw; + xover = (sw << 8) % dw; + ystep = (sh << 8) / dh; + yover = (sh << 8) % dh; - unsigned dy; - for (dy = 0; dy < dh; dy++) + for (dy = 0, syf = 0, yfrac = 0; dy < dh; dy++, syf += ystep, yfrac += yover) { - unsigned dx; - for (dx = 0; dx < dw; dx++) + if (yfrac >= dh) + { + yfrac -= dh; + syf++; + } + sy = syf >> 8; + dptr = ddata + dy * dstride; + dline_end = dptr + dw * bytes_per_pixel; + sline = sdata + sy * sstride; + for (sxf = 0, xfrac = 0; dptr < dline_end; sxf += xstep, xfrac += xover, dptr += bytes_per_pixel) { - grub_uint8_t *dptr; grub_uint8_t *sptr; - unsigned sx; - unsigned sy; int comp; - /* Compute the source coordinate that the destination coordinate - maps to. Note: sx/sw = dx/dw => sx = sw*dx/dw. */ - sx = sw * dx / dw; - sy = sh * dy / dh; + if (xfrac >= dw) + { + xfrac -= dw; + sxf++; + } /* Get the address of the pixels in src and dst. */ - dptr = ddata + dy * dstride + dx * bytes_per_pixel; - sptr = sdata + sy * sstride + sx * bytes_per_pixel; + sx = sxf >> 8; + sptr = sline + sx * bytes_per_pixel; /* If we have enough space to do so, use bilinear interpolation. Otherwise, fall back to nearest neighbor for this pixel. */ @@ -453,27 +477,27 @@ scale_bilinear (struct grub_video_bitmap *dst, struct grub_video_bitmap *src) /* Fixed-point .8 numbers representing the fraction of the distance in the x (u) and y (v) direction within the box of 4 pixels in the source. */ - int u = (256 * sw * dx / dw) - (sx * 256); - int v = (256 * sh * dy / dh) - (sy * 256); + unsigned u = sxf & 0xff; + unsigned v = syf & 0xff; for (comp = 0; comp < bytes_per_pixel; comp++) { /* Get the component's values for the four source corner pixels. */ - int f00 = sptr[comp]; - int f10 = sptr[comp + bytes_per_pixel]; - int f01 = sptr[comp + sstride]; - int f11 = sptr[comp + sstride + bytes_per_pixel]; + unsigned f00 = sptr[comp]; + unsigned f10 = sptr[comp + bytes_per_pixel]; + unsigned f01 = sptr[comp + sstride]; + unsigned f11 = sptr[comp + sstride + bytes_per_pixel]; /* Count coeffecients. */ - int c00 = (256 - u) * (256 - v); - int c10 = u * (256 - v); - int c01 = (256 - u) * v; - int c11 = u * v; + unsigned c00 = (256 - u) * (256 - v); + unsigned c10 = u * (256 - v); + unsigned c01 = (256 - u) * v; + unsigned c11 = u * v; /* Interpolate. */ - int fxy = c00 * f00 + c01 * f01 + c10 * f10 + c11 * f11; - fxy = fxy / (256 * 256); + unsigned fxy = c00 * f00 + c01 * f01 + c10 * f10 + c11 * f11; + fxy = fxy >> 16; dptr[comp] = fxy; } ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 18:46 ` Michael Zimmermann 2015-02-25 18:56 ` Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 20:48 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-25 20:54 ` Michael Zimmermann 2015-02-26 16:44 ` Vladimir 'φ-coder/phcoder' Serbinenko 2 siblings, 1 reply; 24+ messages in thread From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 20:48 UTC (permalink / raw) To: The development of GNU GRUB On 25.02.2015 19:46, Michael Zimmermann wrote: > oh ok so linux's div/mod/... assembler is as slow/fast as grub's code? > Linux uses armv5>= ifdefs. Maybe we could optimized things a little :) maintaining optimised asm routines is a lot of burden. You'll get more bugs than speedup. It's possible to use sdiv/udiv after checking CPU model properly but it doesn't cover 64-bit division and usable only on few cpus anyway. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 20:48 ` Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-25 20:54 ` Michael Zimmermann 0 siblings, 0 replies; 24+ messages in thread From: Michael Zimmermann @ 2015-02-25 20:54 UTC (permalink / raw) To: The development of GNU GRUB the latest patch works just fine. even bi-linear scaling is fast :D On Wed, Feb 25, 2015 at 9:48 PM, Vladimir 'φ-coder/phcoder' Serbinenko <phcoder@gmail.com> wrote: > On 25.02.2015 19:46, Michael Zimmermann wrote: >> >> oh ok so linux's div/mod/... assembler is as slow/fast as grub's code? >> Linux uses armv5>= ifdefs. Maybe we could optimized things a little :) > > maintaining optimised asm routines is a lot of burden. You'll get more bugs > than speedup. It's possible to use sdiv/udiv after checking CPU model > properly but it doesn't cover 64-bit division and usable only on few cpus > anyway. > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-25 18:46 ` Michael Zimmermann 2015-02-25 18:56 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-25 20:48 ` Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 16:44 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-26 17:10 ` Michael Zimmermann 2 siblings, 1 reply; 24+ messages in thread From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 16:44 UTC (permalink / raw) To: The development of GNU GRUB On 25.02.2015 19:46, Michael Zimmermann wrote: > I'm still questioning the efficiency > math operations because on slow devices there are other bottlenecks of > the same kind(like de/compression). That's pure speculation at that point. GRUB has 3 compression algorithms: - minilzo. Has some divisions in parts which GRUB doesn't use. Those parts are easily disablable and I'll just do so. - gzip. Uses division only in zlib header check. I'll optimise it a little but it's only one division in header check, not in compressed data body. - xz. No divisions ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-26 16:44 ` Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 17:10 ` Michael Zimmermann 2015-02-26 17:16 ` Vladimir 'φ-coder/phcoder' Serbinenko 0 siblings, 1 reply; 24+ messages in thread From: Michael Zimmermann @ 2015-02-26 17:10 UTC (permalink / raw) To: The development of GNU GRUB Is there a way to create a performance profile so I can see what exactly needs so much time? I don't have JTAG but maybe UART+GDB could help with that. adding prints is kind of annoying :D On Thu, Feb 26, 2015 at 5:44 PM, Vladimir 'φ-coder/phcoder' Serbinenko <phcoder@gmail.com> wrote: > On 25.02.2015 19:46, Michael Zimmermann wrote: >> >> I'm still questioning the efficiency >> math operations because on slow devices there are other bottlenecks of >> the same kind(like de/compression). > > That's pure speculation at that point. GRUB has 3 compression algorithms: > - minilzo. Has some divisions in parts which GRUB doesn't use. Those parts > are easily disablable and I'll just do so. > - gzip. Uses division only in zlib header check. I'll optimise it a little > but it's only one division in header check, not in compressed data body. > - xz. No divisions > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-26 17:10 ` Michael Zimmermann @ 2015-02-26 17:16 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-26 20:27 ` Michael Zimmermann 0 siblings, 1 reply; 24+ messages in thread From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 17:16 UTC (permalink / raw) To: The development of GNU GRUB On 26.02.2015 18:10, Michael Zimmermann wrote: > Is there a way to create a performance profile so I can see what > exactly needs so much time? I don't have JTAG but maybe UART+GDB could > help with that. > Have a look at boot_time. > adding prints is kind of annoying :D > > On Thu, Feb 26, 2015 at 5:44 PM, Vladimir 'φ-coder/phcoder' Serbinenko > <phcoder@gmail.com> wrote: >> On 25.02.2015 19:46, Michael Zimmermann wrote: >>> >>> I'm still questioning the efficiency >>> math operations because on slow devices there are other bottlenecks of >>> the same kind(like de/compression). >> >> That's pure speculation at that point. GRUB has 3 compression algorithms: >> - minilzo. Has some divisions in parts which GRUB doesn't use. Those parts >> are easily disablable and I'll just do so. >> - gzip. Uses division only in zlib header check. I'll optimise it a little >> but it's only one division in header check, not in compressed data body. >> - xz. No divisions >> >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-26 17:16 ` Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 20:27 ` Michael Zimmermann 2015-02-26 20:35 ` Vladimir 'φ-coder/phcoder' Serbinenko 0 siblings, 1 reply; 24+ messages in thread From: Michael Zimmermann @ 2015-02-26 20:27 UTC (permalink / raw) To: The development of GNU GRUB well as u can see, boottime isn't detailed enough: http://puu.sh/gdRXp/fc8fc176ce.png Maybe I can hack printf to act a boottime. On Thu, Feb 26, 2015 at 6:16 PM, Vladimir 'φ-coder/phcoder' Serbinenko <phcoder@gmail.com> wrote: > On 26.02.2015 18:10, Michael Zimmermann wrote: >> >> Is there a way to create a performance profile so I can see what >> exactly needs so much time? I don't have JTAG but maybe UART+GDB could >> help with that. >> > Have a look at boot_time. > >> adding prints is kind of annoying :D >> >> On Thu, Feb 26, 2015 at 5:44 PM, Vladimir 'φ-coder/phcoder' Serbinenko >> <phcoder@gmail.com> wrote: >>> >>> On 25.02.2015 19:46, Michael Zimmermann wrote: >>>> >>>> >>>> I'm still questioning the efficiency >>>> math operations because on slow devices there are other bottlenecks of >>>> the same kind(like de/compression). >>> >>> >>> That's pure speculation at that point. GRUB has 3 compression algorithms: >>> - minilzo. Has some divisions in parts which GRUB doesn't use. Those >>> parts >>> are easily disablable and I'll just do so. >>> - gzip. Uses division only in zlib header check. I'll optimise it a >>> little >>> but it's only one division in header check, not in compressed data body. >>> - xz. No divisions >>> >>> >>> _______________________________________________ >>> Grub-devel mailing list >>> Grub-devel@gnu.org >>> https://lists.gnu.org/mailman/listinfo/grub-devel >> >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel >> > > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Image scaling performance 2015-02-26 20:27 ` Michael Zimmermann @ 2015-02-26 20:35 ` Vladimir 'φ-coder/phcoder' Serbinenko 0 siblings, 0 replies; 24+ messages in thread From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-26 20:35 UTC (permalink / raw) To: The development of GNU GRUB On 26.02.2015 21:27, Michael Zimmermann wrote: > well as u can see, boottime isn't detailed enough: > http://puu.sh/gdRXp/fc8fc176ce.png > Just add more boottime checkpoints. > Maybe I can hack printf to act a boottime. > > On Thu, Feb 26, 2015 at 6:16 PM, Vladimir 'φ-coder/phcoder' Serbinenko > <phcoder@gmail.com> wrote: >> On 26.02.2015 18:10, Michael Zimmermann wrote: >>> >>> Is there a way to create a performance profile so I can see what >>> exactly needs so much time? I don't have JTAG but maybe UART+GDB could >>> help with that. >>> >> Have a look at boot_time. >> >>> adding prints is kind of annoying :D >>> >>> On Thu, Feb 26, 2015 at 5:44 PM, Vladimir 'φ-coder/phcoder' Serbinenko >>> <phcoder@gmail.com> wrote: >>>> >>>> On 25.02.2015 19:46, Michael Zimmermann wrote: >>>>> >>>>> >>>>> I'm still questioning the efficiency >>>>> math operations because on slow devices there are other bottlenecks of >>>>> the same kind(like de/compression). >>>> >>>> >>>> That's pure speculation at that point. GRUB has 3 compression algorithms: >>>> - minilzo. Has some divisions in parts which GRUB doesn't use. Those >>>> parts >>>> are easily disablable and I'll just do so. >>>> - gzip. Uses division only in zlib header check. I'll optimise it a >>>> little >>>> but it's only one division in header check, not in compressed data body. >>>> - xz. No divisions >>>> >>>> >>>> _______________________________________________ >>>> Grub-devel mailing list >>>> Grub-devel@gnu.org >>>> https://lists.gnu.org/mailman/listinfo/grub-devel >>> >>> >>> _______________________________________________ >>> Grub-devel mailing list >>> Grub-devel@gnu.org >>> https://lists.gnu.org/mailman/listinfo/grub-devel >>> >> >> >> _______________________________________________ >> Grub-devel mailing list >> Grub-devel@gnu.org >> https://lists.gnu.org/mailman/listinfo/grub-devel > > _______________________________________________ > Grub-devel mailing list > Grub-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/grub-devel > ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-02-26 20:35 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-24 9:39 Image scaling performance Michael Zimmermann 2015-02-24 9:51 ` Vladimir 'phcoder' Serbinenko 2015-02-24 10:00 ` Michael Zimmermann 2015-02-24 11:27 ` Vladimir 'phcoder' Serbinenko 2015-02-24 11:47 ` Michael Zimmermann 2015-02-24 12:39 ` Vladimir 'phcoder' Serbinenko 2015-02-24 18:01 ` Michael Zimmermann 2015-02-24 18:22 ` Andrei Borzenkov 2015-02-25 16:20 ` Leif Lindholm 2015-02-25 15:45 ` Leif Lindholm 2015-02-25 16:23 ` Leif Lindholm 2015-02-25 18:38 ` Michael Zimmermann 2015-02-25 18:41 ` Vladimir 'phcoder' Serbinenko 2015-02-25 18:46 ` Michael Zimmermann 2015-02-25 18:56 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-25 19:28 ` Michael Zimmermann 2015-02-25 20:39 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-25 20:48 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-25 20:54 ` Michael Zimmermann 2015-02-26 16:44 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-26 17:10 ` Michael Zimmermann 2015-02-26 17:16 ` Vladimir 'φ-coder/phcoder' Serbinenko 2015-02-26 20:27 ` Michael Zimmermann 2015-02-26 20:35 ` Vladimir 'φ-coder/phcoder' Serbinenko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.