* [SLOF PATCH 0/2] fbuffer: Improve the speed of the cursor drawing
@ 2015-07-28 10:19 Thomas Huth
2015-07-28 10:19 ` [SLOF PATCH 1/2] fbuffer: Improve invert-region helper Thomas Huth
2015-07-28 10:19 ` [SLOF PATCH 2/2] fbuffer: Use a smaller cursor Thomas Huth
0 siblings, 2 replies; 8+ messages in thread
From: Thomas Huth @ 2015-07-28 10:19 UTC (permalink / raw)
To: slof, nikunj, aik; +Cc: gkurz, linuxppc-dev
Greg Kurz recently already did a great work by introducing the
invert-region helpers to speed up the drawing of the cursor.
This made the grub2 command prompt and editor mode usable.
However, the editor mode of grub2 is still sluggish - because
grub2 always redraws the whole line (with cursor enabled!) when
you enter a character, and since cursor and character drawing
is still not super-fast in SLOF, this sums up to a noticable delay.
So I tried to speed up the drawing a little bit and came up
with the following patches. I've used a little "benchmark"
in Forth to see how the patches improve the drawing:
1 buffer: etest-buf
: etest
erase-screen
41 etest-buf c!
1b emit 5b emit [char] H emit
cursor-on
milliseconds
20 0 do 3a 0 do etest-buf 1 terminal-write drop loop cr loop
milliseconds
swap -
." Duration: " .d ." ms" cr
;
Without my patches, the etest function takes 1.8 s to draw.
With the first patch applied, the function take ca. 1.2 s.
With both patches applied, the function only takes ca. 0.8 s !
The first patch changes invert-region to use a bigger
access width if possible. And the second patch simply
decreases the size of the cursor from a full-character
size rectangle to an underscore-like cursor instead...
not sure whether this is acceptable, but at least I
think it looks as good as the original cursor.
Apart from my patches, I think grub2 should also disable
the cursor when redrawing a whole line in the editor.
However, I do not have much clue about the grub2 sources
yet ... is anybody familiar with this and could give some
pointers?
Thomas Huth (2):
fbuffer: Improve invert-region helper
fbuffer: Use a smaller cursor
board-js2x/slof/helper.fs | 13 ++++++++-----
board-qemu/slof/helper.fs | 14 ++++++++++----
slof/fs/fbuffer.fs | 5 +++--
3 files changed, 21 insertions(+), 11 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [SLOF PATCH 1/2] fbuffer: Improve invert-region helper
2015-07-28 10:19 [SLOF PATCH 0/2] fbuffer: Improve the speed of the cursor drawing Thomas Huth
@ 2015-07-28 10:19 ` Thomas Huth
2015-07-28 17:04 ` Segher Boessenkool
2015-07-28 10:19 ` [SLOF PATCH 2/2] fbuffer: Use a smaller cursor Thomas Huth
1 sibling, 1 reply; 8+ messages in thread
From: Thomas Huth @ 2015-07-28 10:19 UTC (permalink / raw)
To: slof, nikunj, aik; +Cc: gkurz, linuxppc-dev
The introduction of invert-region already speeded up the cursor
drawing very much. But there is still space for improvement:
So far invert-region is accessing the memory only byte by byte,
but with some additional logic that checks the alignment of the
address and the length of the area, we can also make this function
to access the memory with half-word, word or long-word accesses.
With this additional logic, invert-region-x is also no longer
necessary and thus can be removed.
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
board-js2x/slof/helper.fs | 13 ++++++++-----
board-qemu/slof/helper.fs | 14 ++++++++++----
slof/fs/fbuffer.fs | 2 +-
3 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/board-js2x/slof/helper.fs b/board-js2x/slof/helper.fs
index 6030330..5941315 100644
--- a/board-js2x/slof/helper.fs
+++ b/board-js2x/slof/helper.fs
@@ -28,9 +28,12 @@
;
: invert-region ( addr len -- )
- 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
-;
-
-: invert-region-x ( addr len -- )
- /x / 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP drop
+ 2dup or 7 and CASE
+ 0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF
+ 2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF
+ 4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF
+ 6 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF
+ dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF
+ ENDCASE
+ drop
;
diff --git a/board-qemu/slof/helper.fs b/board-qemu/slof/helper.fs
index c807bc6..6613782 100644
--- a/board-qemu/slof/helper.fs
+++ b/board-qemu/slof/helper.fs
@@ -33,10 +33,16 @@
swap -
;
-: invert-region ( addr len -- )
- over swap 0 swap 1 hv-logical-memop drop
+: invert-region-cs ( addr len cellsize -- )
+ >r over swap r@ rshift r> swap 1 hv-logical-memop drop
;
-: invert-region-x ( addr len -- )
- over swap /x / 3 swap 1 hv-logical-memop drop
+: invert-region ( addr len -- )
+ 2dup or 7 and CASE
+ 0 OF 3 invert-region-cs ENDOF
+ 2 OF 1 invert-region-cs ENDOF
+ 4 OF 2 invert-region-cs ENDOF
+ 6 OF 1 invert-region-cs ENDOF
+ dup OF 0 invert-region-cs ENDOF
+ ENDCASE
;
diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
index fcdd2fa..0128c07 100644
--- a/slof/fs/fbuffer.fs
+++ b/slof/fs/fbuffer.fs
@@ -170,7 +170,7 @@ CREATE bitmap-buffer 400 4 * allot
;
: fb8-invert-screen ( -- )
- frame-buffer-adr screen-height screen-width * screen-depth * invert-region-x
+ frame-buffer-adr screen-height screen-width * screen-depth * invert-region
;
: fb8-blink-screen ( -- ) fb8-invert-screen fb8-invert-screen ;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [SLOF PATCH 2/2] fbuffer: Use a smaller cursor
2015-07-28 10:19 [SLOF PATCH 0/2] fbuffer: Improve the speed of the cursor drawing Thomas Huth
2015-07-28 10:19 ` [SLOF PATCH 1/2] fbuffer: Improve invert-region helper Thomas Huth
@ 2015-07-28 10:19 ` Thomas Huth
2015-07-29 3:05 ` Alexey Kardashevskiy
1 sibling, 1 reply; 8+ messages in thread
From: Thomas Huth @ 2015-07-28 10:19 UTC (permalink / raw)
To: slof, nikunj, aik; +Cc: gkurz, linuxppc-dev
Drawing the cursor in the frame buffer memory is a very, very
slow operation. So let's simply switch to a "underscore" cursor
instead of the full block cursor to save some precious cycles.
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
slof/fs/fbuffer.fs | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
index 0128c07..542c431 100644
--- a/slof/fs/fbuffer.fs
+++ b/slof/fs/fbuffer.fs
@@ -98,7 +98,8 @@ CREATE bitmap-buffer 400 4 * allot
: fb8-toggle-cursor ( -- )
line# fb8-line2addr column# fb8-columns2bytes +
- char-height 0 ?DO
+ char-height 3 - screen-width screen-depth * * +
+ 3 0 ?DO
dup char-width screen-depth * invert-region
screen-width screen-depth * +
LOOP drop
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [SLOF PATCH 1/2] fbuffer: Improve invert-region helper
2015-07-28 10:19 ` [SLOF PATCH 1/2] fbuffer: Improve invert-region helper Thomas Huth
@ 2015-07-28 17:04 ` Segher Boessenkool
2015-07-28 21:00 ` Thomas Huth
0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2015-07-28 17:04 UTC (permalink / raw)
To: Thomas Huth; +Cc: slof, nikunj, aik, linuxppc-dev, gkurz
On Tue, Jul 28, 2015 at 12:19:54PM +0200, Thomas Huth wrote:
> : invert-region ( addr len -- )
> - 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
> -;
> -
> -: invert-region-x ( addr len -- )
> - /x / 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP drop
> + 2dup or 7 and CASE
> + 0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF
> + 2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF
> + 4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF
> + 6 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF
> + dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF
> + ENDCASE
> + drop
> ;
Can you access device memory as 64 bits for all supported devices?
You can get a bigger speedup by writing some of the core blitting
functions in C, btw.
A small simplification:
2dup or 7 and CASE
0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF
4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF
3 and
2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF
dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF
ENDCASE
If this code is often called unaligned, it makes more sense to special-
case the begin and end probably.
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SLOF PATCH 1/2] fbuffer: Improve invert-region helper
2015-07-28 17:04 ` Segher Boessenkool
@ 2015-07-28 21:00 ` Thomas Huth
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Huth @ 2015-07-28 21:00 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: slof, nikunj, aik, linuxppc-dev, gkurz
Hi Segher,
On 28/07/15 19:04, Segher Boessenkool wrote:
> On Tue, Jul 28, 2015 at 12:19:54PM +0200, Thomas Huth wrote:
>> : invert-region ( addr len -- )
>> - 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
>> -;
>> -
>> -: invert-region-x ( addr len -- )
>> - /x / 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP drop
>> + 2dup or 7 and CASE
>> + 0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF
>> + 2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF
>> + 4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF
>> + 6 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF
>> + dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF
>> + ENDCASE
>> + drop
>> ;
>
> Can you access device memory as 64 bits for all supported devices?
Yes, should be fine since 64 bit access was already used in the original
code, see fb8-invert-screen in
https://github.com/aik/SLOF/commit/99c534ecc7a8566bd9ca6346915d9ac1bfacae1e
> You can get a bigger speedup by writing some of the core blitting
> functions in C, btw.
Well, the above code is for js2x only ... so this is likely not worth
the effort anymore. The code for qemu-spapr calls into a hypercall
already, so this is already accelerated.
> A small simplification:
>
> 2dup or 7 and CASE
> 0 OF 3 rshift 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP ENDOF
> 4 OF 2 rshift 0 ?DO dup dup rl@ -1 xor swap rl! la1+ LOOP ENDOF
> 3 and
> 2 OF 1 rshift 0 ?DO dup dup rw@ -1 xor swap rw! wa1+ LOOP ENDOF
> dup OF 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP ENDOF
> ENDCASE
Ok, nice idea, makes sense! I'll include it in v2 (after waiting a little
bit to see if there's other feedback)
> If this code is often called unaligned, it makes more sense to special-
> case the begin and end probably.
It's only used for drawing the cursor, so it always should be aligned.
Thomas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SLOF PATCH 2/2] fbuffer: Use a smaller cursor
2015-07-28 10:19 ` [SLOF PATCH 2/2] fbuffer: Use a smaller cursor Thomas Huth
@ 2015-07-29 3:05 ` Alexey Kardashevskiy
2015-07-29 3:42 ` [SLOF] " Segher Boessenkool
0 siblings, 1 reply; 8+ messages in thread
From: Alexey Kardashevskiy @ 2015-07-29 3:05 UTC (permalink / raw)
To: Thomas Huth, slof, nikunj; +Cc: gkurz, linuxppc-dev
On 07/28/2015 08:19 PM, Thomas Huth wrote:
> Drawing the cursor in the frame buffer memory is a very, very
> slow operation. So let's simply switch to a "underscore" cursor
> instead of the full block cursor to save some precious cycles.
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
> slof/fs/fbuffer.fs | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
> index 0128c07..542c431 100644
> --- a/slof/fs/fbuffer.fs
> +++ b/slof/fs/fbuffer.fs
> @@ -98,7 +98,8 @@ CREATE bitmap-buffer 400 4 * allot
>
> : fb8-toggle-cursor ( -- )
> line# fb8-line2addr column# fb8-columns2bytes +
> - char-height 0 ?DO
> + char-height 3 - screen-width screen-depth * * +
> + 3 0 ?DO
Why not just:
- char-height 0 ?DO
+ 1 0 ?DO
? What is this magic with screen-width about?
> dup char-width screen-depth * invert-region
> screen-width screen-depth * +
> LOOP drop
>
--
Alexey
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SLOF] [SLOF PATCH 2/2] fbuffer: Use a smaller cursor
2015-07-29 3:05 ` Alexey Kardashevskiy
@ 2015-07-29 3:42 ` Segher Boessenkool
2015-07-29 6:03 ` Thomas Huth
0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2015-07-29 3:42 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: Thomas Huth, slof, nikunj, linuxppc-dev
On Wed, Jul 29, 2015 at 01:05:48PM +1000, Alexey Kardashevskiy wrote:
> > : fb8-toggle-cursor ( -- )
> > line# fb8-line2addr column# fb8-columns2bytes +
> >- char-height 0 ?DO
> >+ char-height 3 - screen-width screen-depth * * +
> >+ 3 0 ?DO
>
> Why not just:
>
> - char-height 0 ?DO
> + 1 0 ?DO
>
> ? What is this magic with screen-width about?
Thomas' patch draws the cursor as the bottom three lines of a
character cell; your suggestion would draw it as the top one line.
But indeed it could be
char-height dup 3 - ?DO ...
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SLOF] [SLOF PATCH 2/2] fbuffer: Use a smaller cursor
2015-07-29 3:42 ` [SLOF] " Segher Boessenkool
@ 2015-07-29 6:03 ` Thomas Huth
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Huth @ 2015-07-29 6:03 UTC (permalink / raw)
To: Segher Boessenkool, Alexey Kardashevskiy; +Cc: slof, nikunj, linuxppc-dev
On 29/07/15 05:42, Segher Boessenkool wrote:
> On Wed, Jul 29, 2015 at 01:05:48PM +1000, Alexey Kardashevskiy wrote:
>>> : fb8-toggle-cursor ( -- )
>>> line# fb8-line2addr column# fb8-columns2bytes +
>>> - char-height 0 ?DO
>>> + char-height 3 - screen-width screen-depth * * +
>>> + 3 0 ?DO
>>
>> Why not just:
>>
>> - char-height 0 ?DO
>> + 1 0 ?DO
>>
>> ? What is this magic with screen-width about?
>
> Thomas' patch draws the cursor as the bottom three lines of a
> character cell; your suggestion would draw it as the top one line.
Right.
> But indeed it could be
>
> char-height dup 3 - ?DO ...
Since the loop body expects the framebuffer address on the stack, the
loop boundaries are just used to count the iterations ... so "3 0"
sounds like the easiest and most readable way to specify this IMHO.
Thomas
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-07-29 6:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-28 10:19 [SLOF PATCH 0/2] fbuffer: Improve the speed of the cursor drawing Thomas Huth
2015-07-28 10:19 ` [SLOF PATCH 1/2] fbuffer: Improve invert-region helper Thomas Huth
2015-07-28 17:04 ` Segher Boessenkool
2015-07-28 21:00 ` Thomas Huth
2015-07-28 10:19 ` [SLOF PATCH 2/2] fbuffer: Use a smaller cursor Thomas Huth
2015-07-29 3:05 ` Alexey Kardashevskiy
2015-07-29 3:42 ` [SLOF] " Segher Boessenkool
2015-07-29 6:03 ` Thomas Huth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).