linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] fbuffer: performance improvement + code cleanup
@ 2015-05-28 13:13 Greg Kurz
  2015-05-28 13:13 ` [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor Greg Kurz
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Greg Kurz @ 2015-05-28 13:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Thomas Huth, Nikunj A Dadhania,
	David Gibson

If booted in frame buffer mode, board-qemu currently calls hv-logical-load
and hv-logical-store for every pixel when enabling or disabling the cursor.
This is suboptimal when writing one char at a time to the console since
terminal-write always toggles the cursor. And this is precisely what grub
is doing when the user wants to edit a menu entry... the result is an
incredibly slow and barely usable interface.

This series introduces per-board helpers to be used by the frame buffer
code, so that board-qemu may have its own accelarated implementation:

- the first patch is preliminary cleanup, before moving code out to helpers.

- the second patch introduces a helper to invert a memory region byte-per-byte:
  this fixes the unbearable slowliness of grub editing mode.

- the third patch introduces a similar helper with a a quad-word pace: it
  doesn't bring any speed improvement since board-qemu already uses
  hv-logical-memop, but it allows to "unify hcall-invert-screen and
  fb8-invert-screen again".

Please comment.

---

Greg Kurz (3):
      fbuffer: simplify address computations in fb8-toggle-cursor
      fbuffer: introduce the invert-region helper
      fbuffer: introduce the invert-region-x helper


 board-js2x/slof/helper.fs               |    9 +++++++++
 board-qemu/slof/helper.fs               |    7 +++++++
 board-qemu/slof/pci-device_1234_1111.fs |   10 +---------
 slof/fs/fbuffer.fs                      |    8 +++-----
 4 files changed, 20 insertions(+), 14 deletions(-)

--
Greg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor
  2015-05-28 13:13 [PATCH 0/3] fbuffer: performance improvement + code cleanup Greg Kurz
@ 2015-05-28 13:13 ` Greg Kurz
  2015-05-28 13:30   ` Thomas Huth
  2015-05-29  4:17   ` Nikunj A Dadhania
  2015-05-28 13:13 ` [PATCH 2/3] fbuffer: introduce the invert-region helper Greg Kurz
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Greg Kurz @ 2015-05-28 13:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Thomas Huth, Nikunj A Dadhania,
	David Gibson

The inner loop deals with a contiguous region. It could easily be replaced
by faster board specific functions like hv-logical-memop in board-qemu.
Since hv-logical-memop does not return an address, let's have the enclosing
loop compute the next line address by itself and drop the confusing
"char-width screen-depth * -" address adjustment.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
 slof/fs/fbuffer.fs |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
index 756f05a..faae6a9 100644
--- a/slof/fs/fbuffer.fs
+++ b/slof/fs/fbuffer.fs
@@ -99,8 +99,8 @@ CREATE bitmap-buffer 400 4 * allot
 : fb8-toggle-cursor ( -- )
 	line# fb8-line2addr column# fb8-columns2bytes +
 	char-height 0 ?DO
-		char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP
-		screen-width screen-depth * + char-width screen-depth * -
+		dup char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
+		screen-width screen-depth * +
 	LOOP drop
 ;
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] fbuffer: introduce the invert-region helper
  2015-05-28 13:13 [PATCH 0/3] fbuffer: performance improvement + code cleanup Greg Kurz
  2015-05-28 13:13 ` [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor Greg Kurz
@ 2015-05-28 13:13 ` Greg Kurz
  2015-05-28 17:19   ` Thomas Huth
  2015-05-29  4:17   ` Nikunj A Dadhania
  2015-05-28 13:13 ` [PATCH 3/3] fbuffer: introduce the invert-region-x helper Greg Kurz
  2015-05-29  4:54 ` [PATCH 0/3] fbuffer: performance improvement + code cleanup Alexey Kardashevskiy
  3 siblings, 2 replies; 11+ messages in thread
From: Greg Kurz @ 2015-05-28 13:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Thomas Huth, Nikunj A Dadhania,
	David Gibson

The inner loop in fb8-toggle-cursor can be implemented with hv-logical-memop
in board-qemu and get an incredible performance boost.

Let's introduce a per-board helper:
- board-js2x: slow RB based, taken from current fb8-toggle-cursor
- board-qemu: faster hv-logical-memop based

With standard graphical settings on board-qemu, we go from 512 hcall
invocations per character down to 16.

Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
 board-js2x/slof/helper.fs |    4 ++++
 board-qemu/slof/helper.fs |    3 +++
 slof/fs/fbuffer.fs        |    2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/board-js2x/slof/helper.fs b/board-js2x/slof/helper.fs
index 34d60da..918fdc4 100644
--- a/board-js2x/slof/helper.fs
+++ b/board-js2x/slof/helper.fs
@@ -26,3 +26,7 @@
    s" , " $cat
    bdate2human $cat encode-string THEN
 ;
+
+: invert-region ( addr len -- )
+   0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
+;
diff --git a/board-qemu/slof/helper.fs b/board-qemu/slof/helper.fs
index 96da498..da676c7 100644
--- a/board-qemu/slof/helper.fs
+++ b/board-qemu/slof/helper.fs
@@ -33,3 +33,6 @@
   swap -
 ;
 
+: invert-region ( addr len -- )
+   over swap 0 swap 1 hv-logical-memop drop
+;
diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
index faae6a9..deeba6b 100644
--- a/slof/fs/fbuffer.fs
+++ b/slof/fs/fbuffer.fs
@@ -99,7 +99,7 @@ CREATE bitmap-buffer 400 4 * allot
 : fb8-toggle-cursor ( -- )
 	line# fb8-line2addr column# fb8-columns2bytes +
 	char-height 0 ?DO
-		dup char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
+		dup char-width screen-depth * invert-region
 		screen-width screen-depth * +
 	LOOP drop
 ;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] fbuffer: introduce the invert-region-x helper
  2015-05-28 13:13 [PATCH 0/3] fbuffer: performance improvement + code cleanup Greg Kurz
  2015-05-28 13:13 ` [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor Greg Kurz
  2015-05-28 13:13 ` [PATCH 2/3] fbuffer: introduce the invert-region helper Greg Kurz
@ 2015-05-28 13:13 ` Greg Kurz
  2015-05-28 17:33   ` Thomas Huth
  2015-05-29  4:25   ` Nikunj A Dadhania
  2015-05-29  4:54 ` [PATCH 0/3] fbuffer: performance improvement + code cleanup Alexey Kardashevskiy
  3 siblings, 2 replies; 11+ messages in thread
From: Greg Kurz @ 2015-05-28 13:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Thomas Huth, Nikunj A Dadhania,
	David Gibson

This patch simply moves the slow RX based logic from fb8-invert-screen
to board-js2x helpers and implement a fast hv-logical-memop based helper
for board-qemu. And we can drop hcall-invert-screen !

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
 board-js2x/slof/helper.fs               |    5 +++++
 board-qemu/slof/helper.fs               |    4 ++++
 board-qemu/slof/pci-device_1234_1111.fs |   10 +---------
 slof/fs/fbuffer.fs                      |    4 +---
 4 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/board-js2x/slof/helper.fs b/board-js2x/slof/helper.fs
index 918fdc4..ea2d584 100644
--- a/board-js2x/slof/helper.fs
+++ b/board-js2x/slof/helper.fs
@@ -30,3 +30,8 @@
 : invert-region ( addr len -- )
    0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
 ;
+
+
+: invert-region-x ( addr len -- )
+   /x / 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP drop
+;
diff --git a/board-qemu/slof/helper.fs b/board-qemu/slof/helper.fs
index da676c7..c807bc6 100644
--- a/board-qemu/slof/helper.fs
+++ b/board-qemu/slof/helper.fs
@@ -36,3 +36,7 @@
 : invert-region ( addr len -- )
    over swap 0 swap 1 hv-logical-memop drop
 ;
+
+: invert-region-x ( addr len -- )
+   over swap /x / 3 swap 1 hv-logical-memop drop
+;
diff --git a/board-qemu/slof/pci-device_1234_1111.fs b/board-qemu/slof/pci-device_1234_1111.fs
index a5c3584..26b0623 100644
--- a/board-qemu/slof/pci-device_1234_1111.fs
+++ b/board-qemu/slof/pci-device_1234_1111.fs
@@ -188,16 +188,9 @@ a CONSTANT VBE_DISPI_INDEX_NB
 : display-remove ( -- ) 
 ;
 
-: hcall-invert-screen ( -- )
-    frame-buffer-adr frame-buffer-adr 3
-    screen-height screen-width * screen-depth * /x /
-    1 hv-logical-memop
-    drop
-;
-
 : hcall-blink-screen ( -- )
     \ 32 msec delay for visually noticing the blink
-    hcall-invert-screen 20 ms hcall-invert-screen
+    invert-screen 20 ms invert-screen
 ;
 
 : display-install ( -- )
@@ -211,7 +204,6 @@ a CONSTANT VBE_DISPI_INDEX_NB
         disp-width char-width / disp-height char-height /
         disp-depth 7 + 8 /                      ( width height #lines #cols depth )
         fb-install
-	['] hcall-invert-screen to invert-screen
 	['] hcall-blink-screen to blink-screen
          true to is-installed?
     THEN
diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
index deeba6b..fcdd2fa 100644
--- a/slof/fs/fbuffer.fs
+++ b/slof/fs/fbuffer.fs
@@ -170,9 +170,7 @@ CREATE bitmap-buffer 400 4 * allot
 ;
 
 : fb8-invert-screen ( -- )
-	frame-buffer-adr screen-height screen-width * screen-depth * 2dup /x / 0 ?DO
-		dup rx@ -1 xor over rx! xa1+
-	LOOP 3drop
+	frame-buffer-adr screen-height screen-width * screen-depth * invert-region-x
 ;
 
 : fb8-blink-screen ( -- ) fb8-invert-screen fb8-invert-screen ;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor
  2015-05-28 13:13 ` [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor Greg Kurz
@ 2015-05-28 13:30   ` Thomas Huth
  2015-05-29  4:17   ` Nikunj A Dadhania
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Huth @ 2015-05-28 13:30 UTC (permalink / raw)
  To: Greg Kurz
  Cc: linuxppc-dev, Alexey Kardashevskiy, Nikunj A Dadhania,
	David Gibson

On Thu, 28 May 2015 15:13:14 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> The inner loop deals with a contiguous region. It could easily be replaced
> by faster board specific functions like hv-logical-memop in board-qemu.
> Since hv-logical-memop does not return an address, let's have the enclosing
> loop compute the next line address by itself and drop the confusing
> "char-width screen-depth * -" address adjustment.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
>  slof/fs/fbuffer.fs |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
> index 756f05a..faae6a9 100644
> --- a/slof/fs/fbuffer.fs
> +++ b/slof/fs/fbuffer.fs
> @@ -99,8 +99,8 @@ CREATE bitmap-buffer 400 4 * allot
>  : fb8-toggle-cursor ( -- )
>  	line# fb8-line2addr column# fb8-columns2bytes +
>  	char-height 0 ?DO
> -		char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP
> -		screen-width screen-depth * + char-width screen-depth * -
> +		dup char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
> +		screen-width screen-depth * +
>  	LOOP drop
>  ;

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] fbuffer: introduce the invert-region helper
  2015-05-28 13:13 ` [PATCH 2/3] fbuffer: introduce the invert-region helper Greg Kurz
@ 2015-05-28 17:19   ` Thomas Huth
  2015-05-29  4:17   ` Nikunj A Dadhania
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Huth @ 2015-05-28 17:19 UTC (permalink / raw)
  To: Greg Kurz
  Cc: linuxppc-dev, Alexey Kardashevskiy, Nikunj A Dadhania,
	David Gibson

On Thu, 28 May 2015 15:13:19 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> The inner loop in fb8-toggle-cursor can be implemented with hv-logical-memop
> in board-qemu and get an incredible performance boost.
> 
> Let's introduce a per-board helper:
> - board-js2x: slow RB based, taken from current fb8-toggle-cursor
> - board-qemu: faster hv-logical-memop based
> 
> With standard graphical settings on board-qemu, we go from 512 hcall
> invocations per character down to 16.
> 
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
>  board-js2x/slof/helper.fs |    4 ++++
>  board-qemu/slof/helper.fs |    3 +++
>  slof/fs/fbuffer.fs        |    2 +-
>  3 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/board-js2x/slof/helper.fs b/board-js2x/slof/helper.fs
> index 34d60da..918fdc4 100644
> --- a/board-js2x/slof/helper.fs
> +++ b/board-js2x/slof/helper.fs
> @@ -26,3 +26,7 @@
>     s" , " $cat
>     bdate2human $cat encode-string THEN
>  ;
> +
> +: invert-region ( addr len -- )
> +   0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
> +;
> diff --git a/board-qemu/slof/helper.fs b/board-qemu/slof/helper.fs
> index 96da498..da676c7 100644
> --- a/board-qemu/slof/helper.fs
> +++ b/board-qemu/slof/helper.fs
> @@ -33,3 +33,6 @@
>    swap -
>  ;
>  
> +: invert-region ( addr len -- )
> +   over swap 0 swap 1 hv-logical-memop drop
> +;
> diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
> index faae6a9..deeba6b 100644
> --- a/slof/fs/fbuffer.fs
> +++ b/slof/fs/fbuffer.fs
> @@ -99,7 +99,7 @@ CREATE bitmap-buffer 400 4 * allot
>  : fb8-toggle-cursor ( -- )
>  	line# fb8-line2addr column# fb8-columns2bytes +
>  	char-height 0 ?DO
> -		dup char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
> +		dup char-width screen-depth * invert-region
>  		screen-width screen-depth * +
>  	LOOP drop
>  ;

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] fbuffer: introduce the invert-region-x helper
  2015-05-28 13:13 ` [PATCH 3/3] fbuffer: introduce the invert-region-x helper Greg Kurz
@ 2015-05-28 17:33   ` Thomas Huth
  2015-05-29  4:25   ` Nikunj A Dadhania
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Huth @ 2015-05-28 17:33 UTC (permalink / raw)
  To: Greg Kurz
  Cc: linuxppc-dev, Alexey Kardashevskiy, Nikunj A Dadhania,
	David Gibson

On Thu, 28 May 2015 15:13:24 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> This patch simply moves the slow RX based logic from fb8-invert-screen
> to board-js2x helpers and implement a fast hv-logical-memop based helper
> for board-qemu. And we can drop hcall-invert-screen !
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
>  board-js2x/slof/helper.fs               |    5 +++++
>  board-qemu/slof/helper.fs               |    4 ++++
>  board-qemu/slof/pci-device_1234_1111.fs |   10 +---------
>  slof/fs/fbuffer.fs                      |    4 +---
>  4 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/board-js2x/slof/helper.fs b/board-js2x/slof/helper.fs
> index 918fdc4..ea2d584 100644
> --- a/board-js2x/slof/helper.fs
> +++ b/board-js2x/slof/helper.fs
> @@ -30,3 +30,8 @@
>  : invert-region ( addr len -- )
>     0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
>  ;
> +
> +

Maybe remove one of the two empty lines?

> +: invert-region-x ( addr len -- )
> +   /x / 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP drop
> +;
> diff --git a/board-qemu/slof/helper.fs b/board-qemu/slof/helper.fs
> index da676c7..c807bc6 100644
> --- a/board-qemu/slof/helper.fs
> +++ b/board-qemu/slof/helper.fs
> @@ -36,3 +36,7 @@
>  : invert-region ( addr len -- )
>     over swap 0 swap 1 hv-logical-memop drop
>  ;
> +
> +: invert-region-x ( addr len -- )
> +   over swap /x / 3 swap 1 hv-logical-memop drop
> +;
> diff --git a/board-qemu/slof/pci-device_1234_1111.fs b/board-qemu/slof/pci-device_1234_1111.fs
> index a5c3584..26b0623 100644
> --- a/board-qemu/slof/pci-device_1234_1111.fs
> +++ b/board-qemu/slof/pci-device_1234_1111.fs
> @@ -188,16 +188,9 @@ a CONSTANT VBE_DISPI_INDEX_NB
>  : display-remove ( -- ) 
>  ;
>  
> -: hcall-invert-screen ( -- )
> -    frame-buffer-adr frame-buffer-adr 3
> -    screen-height screen-width * screen-depth * /x /
> -    1 hv-logical-memop
> -    drop
> -;
> -
>  : hcall-blink-screen ( -- )
>      \ 32 msec delay for visually noticing the blink
> -    hcall-invert-screen 20 ms hcall-invert-screen
> +    invert-screen 20 ms invert-screen
>  ;
>  
>  : display-install ( -- )
> @@ -211,7 +204,6 @@ a CONSTANT VBE_DISPI_INDEX_NB
>          disp-width char-width / disp-height char-height /
>          disp-depth 7 + 8 /                      ( width height #lines #cols depth )
>          fb-install
> -	['] hcall-invert-screen to invert-screen
>  	['] hcall-blink-screen to blink-screen
>           true to is-installed?
>      THEN
> diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
> index deeba6b..fcdd2fa 100644
> --- a/slof/fs/fbuffer.fs
> +++ b/slof/fs/fbuffer.fs
> @@ -170,9 +170,7 @@ CREATE bitmap-buffer 400 4 * allot
>  ;
>  
>  : fb8-invert-screen ( -- )
> -	frame-buffer-adr screen-height screen-width * screen-depth * 2dup /x / 0 ?DO
> -		dup rx@ -1 xor over rx! xa1+
> -	LOOP 3drop
> +	frame-buffer-adr screen-height screen-width * screen-depth * invert-region-x
>  ;
>  
>  : fb8-blink-screen ( -- ) fb8-invert-screen fb8-invert-screen ;

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor
  2015-05-28 13:13 ` [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor Greg Kurz
  2015-05-28 13:30   ` Thomas Huth
@ 2015-05-29  4:17   ` Nikunj A Dadhania
  1 sibling, 0 replies; 11+ messages in thread
From: Nikunj A Dadhania @ 2015-05-29  4:17 UTC (permalink / raw)
  To: Greg Kurz, linuxppc-dev; +Cc: Alexey Kardashevskiy, Thomas Huth, David Gibson

Greg Kurz <gkurz@linux.vnet.ibm.com> writes:

> The inner loop deals with a contiguous region. It could easily be replaced
> by faster board specific functions like hv-logical-memop in board-qemu.
> Since hv-logical-memop does not return an address, let's have the enclosing
> loop compute the next line address by itself and drop the confusing
> "char-width screen-depth * -" address adjustment.

Much better :-)

Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

>
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
>  slof/fs/fbuffer.fs |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
> index 756f05a..faae6a9 100644
> --- a/slof/fs/fbuffer.fs
> +++ b/slof/fs/fbuffer.fs
> @@ -99,8 +99,8 @@ CREATE bitmap-buffer 400 4 * allot
>  : fb8-toggle-cursor ( -- )
>  	line# fb8-line2addr column# fb8-columns2bytes +
>  	char-height 0 ?DO
> -		char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP
> -		screen-width screen-depth * + char-width screen-depth * -
> +		dup char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
> +		screen-width screen-depth * +
>  	LOOP drop
>  ;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] fbuffer: introduce the invert-region helper
  2015-05-28 13:13 ` [PATCH 2/3] fbuffer: introduce the invert-region helper Greg Kurz
  2015-05-28 17:19   ` Thomas Huth
@ 2015-05-29  4:17   ` Nikunj A Dadhania
  1 sibling, 0 replies; 11+ messages in thread
From: Nikunj A Dadhania @ 2015-05-29  4:17 UTC (permalink / raw)
  To: Greg Kurz, linuxppc-dev; +Cc: Alexey Kardashevskiy, Thomas Huth, David Gibson

Greg Kurz <gkurz@linux.vnet.ibm.com> writes:

> The inner loop in fb8-toggle-cursor can be implemented with hv-logical-memop
> in board-qemu and get an incredible performance boost.
>
> Let's introduce a per-board helper:
> - board-js2x: slow RB based, taken from current fb8-toggle-cursor
> - board-qemu: faster hv-logical-memop based
>
> With standard graphical settings on board-qemu, we go from 512 hcall
> invocations per character down to 16.
>
> Suggested-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

> ---
>  board-js2x/slof/helper.fs |    4 ++++
>  board-qemu/slof/helper.fs |    3 +++
>  slof/fs/fbuffer.fs        |    2 +-
>  3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/board-js2x/slof/helper.fs b/board-js2x/slof/helper.fs
> index 34d60da..918fdc4 100644
> --- a/board-js2x/slof/helper.fs
> +++ b/board-js2x/slof/helper.fs
> @@ -26,3 +26,7 @@
>     s" , " $cat
>     bdate2human $cat encode-string THEN
>  ;
> +
> +: invert-region ( addr len -- )
> +   0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
> +;
> diff --git a/board-qemu/slof/helper.fs b/board-qemu/slof/helper.fs
> index 96da498..da676c7 100644
> --- a/board-qemu/slof/helper.fs
> +++ b/board-qemu/slof/helper.fs
> @@ -33,3 +33,6 @@
>    swap -
>  ;
>
> +: invert-region ( addr len -- )
> +   over swap 0 swap 1 hv-logical-memop drop
> +;
> diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
> index faae6a9..deeba6b 100644
> --- a/slof/fs/fbuffer.fs
> +++ b/slof/fs/fbuffer.fs
> @@ -99,7 +99,7 @@ CREATE bitmap-buffer 400 4 * allot
>  : fb8-toggle-cursor ( -- )
>  	line# fb8-line2addr column# fb8-columns2bytes +
>  	char-height 0 ?DO
> -		dup char-width screen-depth * 0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
> +		dup char-width screen-depth * invert-region
>  		screen-width screen-depth * +
>  	LOOP drop
>  ;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] fbuffer: introduce the invert-region-x helper
  2015-05-28 13:13 ` [PATCH 3/3] fbuffer: introduce the invert-region-x helper Greg Kurz
  2015-05-28 17:33   ` Thomas Huth
@ 2015-05-29  4:25   ` Nikunj A Dadhania
  1 sibling, 0 replies; 11+ messages in thread
From: Nikunj A Dadhania @ 2015-05-29  4:25 UTC (permalink / raw)
  To: Greg Kurz, linuxppc-dev; +Cc: Alexey Kardashevskiy, Thomas Huth, David Gibson

Greg Kurz <gkurz@linux.vnet.ibm.com> writes:

> This patch simply moves the slow RX based logic from fb8-invert-screen
> to board-js2x helpers and implement a fast hv-logical-memop based helper
> for board-qemu. And we can drop hcall-invert-screen !
>
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Apart for the extra lines that Thomas pointed:

Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

> ---
>  board-js2x/slof/helper.fs               |    5 +++++
>  board-qemu/slof/helper.fs               |    4 ++++
>  board-qemu/slof/pci-device_1234_1111.fs |   10 +---------
>  slof/fs/fbuffer.fs                      |    4 +---
>  4 files changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/board-js2x/slof/helper.fs b/board-js2x/slof/helper.fs
> index 918fdc4..ea2d584 100644
> --- a/board-js2x/slof/helper.fs
> +++ b/board-js2x/slof/helper.fs
> @@ -30,3 +30,8 @@
>  : invert-region ( addr len -- )
>     0 ?DO dup dup rb@ -1 xor swap rb! 1+ LOOP drop
>  ;
> +
> +
> +: invert-region-x ( addr len -- )
> +   /x / 0 ?DO dup dup rx@ -1 xor swap rx! xa1+ LOOP drop
> +;
> diff --git a/board-qemu/slof/helper.fs b/board-qemu/slof/helper.fs
> index da676c7..c807bc6 100644
> --- a/board-qemu/slof/helper.fs
> +++ b/board-qemu/slof/helper.fs
> @@ -36,3 +36,7 @@
>  : invert-region ( addr len -- )
>     over swap 0 swap 1 hv-logical-memop drop
>  ;
> +
> +: invert-region-x ( addr len -- )
> +   over swap /x / 3 swap 1 hv-logical-memop drop
> +;
> diff --git a/board-qemu/slof/pci-device_1234_1111.fs b/board-qemu/slof/pci-device_1234_1111.fs
> index a5c3584..26b0623 100644
> --- a/board-qemu/slof/pci-device_1234_1111.fs
> +++ b/board-qemu/slof/pci-device_1234_1111.fs
> @@ -188,16 +188,9 @@ a CONSTANT VBE_DISPI_INDEX_NB
>  : display-remove ( -- ) 
>  ;
>
> -: hcall-invert-screen ( -- )
> -    frame-buffer-adr frame-buffer-adr 3
> -    screen-height screen-width * screen-depth * /x /
> -    1 hv-logical-memop
> -    drop
> -;
> -
>  : hcall-blink-screen ( -- )
>      \ 32 msec delay for visually noticing the blink
> -    hcall-invert-screen 20 ms hcall-invert-screen
> +    invert-screen 20 ms invert-screen
>  ;
>
>  : display-install ( -- )
> @@ -211,7 +204,6 @@ a CONSTANT VBE_DISPI_INDEX_NB
>          disp-width char-width / disp-height char-height /
>          disp-depth 7 + 8 /                      ( width height #lines #cols depth )
>          fb-install
> -	['] hcall-invert-screen to invert-screen
>  	['] hcall-blink-screen to blink-screen
>           true to is-installed?
>      THEN
> diff --git a/slof/fs/fbuffer.fs b/slof/fs/fbuffer.fs
> index deeba6b..fcdd2fa 100644
> --- a/slof/fs/fbuffer.fs
> +++ b/slof/fs/fbuffer.fs
> @@ -170,9 +170,7 @@ CREATE bitmap-buffer 400 4 * allot
>  ;
>
>  : fb8-invert-screen ( -- )
> -	frame-buffer-adr screen-height screen-width * screen-depth * 2dup /x / 0 ?DO
> -		dup rx@ -1 xor over rx! xa1+
> -	LOOP 3drop
> +	frame-buffer-adr screen-height screen-width * screen-depth * invert-region-x
>  ;
>
>  : fb8-blink-screen ( -- ) fb8-invert-screen fb8-invert-screen ;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] fbuffer: performance improvement + code cleanup
  2015-05-28 13:13 [PATCH 0/3] fbuffer: performance improvement + code cleanup Greg Kurz
                   ` (2 preceding siblings ...)
  2015-05-28 13:13 ` [PATCH 3/3] fbuffer: introduce the invert-region-x helper Greg Kurz
@ 2015-05-29  4:54 ` Alexey Kardashevskiy
  3 siblings, 0 replies; 11+ messages in thread
From: Alexey Kardashevskiy @ 2015-05-29  4:54 UTC (permalink / raw)
  To: Greg Kurz, linuxppc-dev; +Cc: Thomas Huth, Nikunj A Dadhania, David Gibson

On 05/28/2015 11:13 PM, Greg Kurz wrote:
> If booted in frame buffer mode, board-qemu currently calls hv-logical-load
> and hv-logical-store for every pixel when enabling or disabling the cursor.
> This is suboptimal when writing one char at a time to the console since
> terminal-write always toggles the cursor. And this is precisely what grub
> is doing when the user wants to edit a menu entry... the result is an
> incredibly slow and barely usable interface.
>
> This series introduces per-board helpers to be used by the frame buffer
> code, so that board-qemu may have its own accelarated implementation:
>
> - the first patch is preliminary cleanup, before moving code out to helpers.
>
> - the second patch introduces a helper to invert a memory region byte-per-byte:
>    this fixes the unbearable slowliness of grub editing mode.
>
> - the third patch introduces a similar helper with a a quad-word pace: it
>    doesn't bring any speed improvement since board-qemu already uses
>    hv-logical-memop, but it allows to "unify hcall-invert-screen and
>    fb8-invert-screen again".
>
> Please comment.

Thanks, I'll remove that extra line in 3/3 and push these today.


>
> ---
>
> Greg Kurz (3):
>        fbuffer: simplify address computations in fb8-toggle-cursor
>        fbuffer: introduce the invert-region helper
>        fbuffer: introduce the invert-region-x helper
>
>
>   board-js2x/slof/helper.fs               |    9 +++++++++
>   board-qemu/slof/helper.fs               |    7 +++++++
>   board-qemu/slof/pci-device_1234_1111.fs |   10 +---------
>   slof/fs/fbuffer.fs                      |    8 +++-----
>   4 files changed, 20 insertions(+), 14 deletions(-)
>
> --
> Greg
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-05-29  4:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-28 13:13 [PATCH 0/3] fbuffer: performance improvement + code cleanup Greg Kurz
2015-05-28 13:13 ` [PATCH 1/3] fbuffer: simplify address computations in fb8-toggle-cursor Greg Kurz
2015-05-28 13:30   ` Thomas Huth
2015-05-29  4:17   ` Nikunj A Dadhania
2015-05-28 13:13 ` [PATCH 2/3] fbuffer: introduce the invert-region helper Greg Kurz
2015-05-28 17:19   ` Thomas Huth
2015-05-29  4:17   ` Nikunj A Dadhania
2015-05-28 13:13 ` [PATCH 3/3] fbuffer: introduce the invert-region-x helper Greg Kurz
2015-05-28 17:33   ` Thomas Huth
2015-05-29  4:25   ` Nikunj A Dadhania
2015-05-29  4:54 ` [PATCH 0/3] fbuffer: performance improvement + code cleanup Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).