public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
* [PATCH v5 00/13] Add video damage tracking
@ 2023-08-21 13:50 Alper Nebi Yasak
  2023-08-21 13:50 ` [PATCH v5 01/13] video: test: Split copy frame buffer check into a function Alper Nebi Yasak
                   ` (13 more replies)
  0 siblings, 14 replies; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:50 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

This is a rebase of Alexander Graf's video damage tracking series, with
some tests and other changes. The original cover letter is as follows:

> This patch set speeds up graphics output on ARM by a factor of 60x.
>
> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> but need it accessible by the display controller which reads directly
> from a later point of consistency. Hence, we flush the frame buffer to
> DRAM on every change. The full frame buffer.
>
> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> that can take a while to flush out. This was reported by Da Xue with grub,
> which happily print 1000s of spaces on the screen to draw a menu. Every
> printed space triggers a cache flush.
>
> This patch set implements the easiest mitigation against this problem:
> Damage tracking. We remember the lowest common denominator region that was
> touched since the last video_sync() call and only flush that. The most
> typical writer to the frame buffer is the video console, which always
> writes rectangles of characters on the screen and syncs afterwards.
>
> With this patch set applied, we reduce drawing a large grub menu (with
> serial console attached for size information) on an RK3399-ROC system
> at 1440p from 55 seconds to less than 1 second.
>
> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> overhead compared to before as well. So even x86 systems should be faster
> with this now :).
>
>
> Alternatives considered:
>
>   1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>      so often. We are missing timers to do this generically.
>
>   2) Double buffering - We could try to identify whether anything changed
>      at all and only draw to the FB if it did. That would require
>      maintaining a second buffer that we need to scan.
>
>   3) Text buffer - Maintain a buffer of all text printed on the screen with
>      respective location. Don't write if the old and new character are
>      identical. This would limit applicability to text only and is an
>      optimization on top of this patch set.
>
>   4) Hash screen lines - Create a hash (sha256?) over every line when it
>      changes. Only flush when it does. I'm not sure if this would waste
>      more time, memory and cache than the current approach. It would make
>      full screen updates much more expensive.

Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"
- Add patch "video: test: Support checking copy frame buffer contents"
- Add patch "video: test: Test partial updates of hardware frame buffer"
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
- Call video_damage() also in video_fill_part()
- Use met->baseline instead of priv->baseline
- Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
- Update console_rotate.c video_damage() calls to pass video tests
- Remove mention about not having minimal damage for console_rotate.c
- Add patch "video: test: Test video damage tracking via vidconsole"
- Document new vdev field in struct efi_gop_obj comment
- Remove video_sync_copy() also from video_fill(), video_fill_part()
- Fix memmove() calls by removing the extra dev argument
- Call video_sync() before checking copy_fb in video tests
- Imply VIDEO_DAMAGE for video drivers instead of selecting it
- Imply VIDEO_DAMAGE also for VIDEO_TIDSS

v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/

Changes in v4:
- Move damage clear to patch "dm: video: Add damage tracking API"
- Simplify first damage logic
- Remove VIDEO_DAMAGE default for ARM
- Skip damage on EfiBltVideoToBltBuffer
- Add patch "video: Always compile cache flushing code"
- Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"

v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/

Changes in v3:
- Adapt to always assume DM is used
- Adapt to always assume DM is used
- Make VIDEO_COPY always select VIDEO_DAMAGE

v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/

Changes in v2:
- Remove ifdefs
- Fix ranges in truetype target
- Limit rotate to necessary damage
- Remove ifdefs from gop
- Fix dcache range; we were flushing too much before
- Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"

v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/

Alexander Graf (9):
  dm: video: Add damage tracking API
  dm: video: Add damage notification on display fills
  vidconsole: Add damage notifications to all vidconsole drivers
  video: Add damage notification on bmp display
  efi_loader: GOP: Add damage notification on BLT
  video: Only dcache flush damaged lines
  video: Use VIDEO_DAMAGE for VIDEO_COPY
  video: Always compile cache flushing code
  video: Enable VIDEO_DAMAGE for drivers that need it

Alper Nebi Yasak (4):
  video: test: Split copy frame buffer check into a function
  video: test: Support checking copy frame buffer contents
  video: test: Test partial updates of hardware frame buffer
  video: test: Test video damage tracking via vidconsole

 arch/arm/mach-omap2/omap3/Kconfig |   1 +
 arch/arm/mach-sunxi/Kconfig       |   1 +
 drivers/video/Kconfig             |  26 +++
 drivers/video/console_normal.c    |  27 ++--
 drivers/video/console_rotate.c    |  94 +++++++----
 drivers/video/console_truetype.c  |  37 +++--
 drivers/video/exynos/Kconfig      |   1 +
 drivers/video/imx/Kconfig         |   1 +
 drivers/video/meson/Kconfig       |   1 +
 drivers/video/rockchip/Kconfig    |   1 +
 drivers/video/stm32/Kconfig       |   1 +
 drivers/video/tegra20/Kconfig     |   1 +
 drivers/video/tidss/Kconfig       |   1 +
 drivers/video/vidconsole-uclass.c |  16 --
 drivers/video/video-uclass.c      | 190 ++++++++++++----------
 drivers/video/video_bmp.c         |   7 +-
 include/video.h                   |  59 +++----
 include/video_console.h           |  52 ------
 lib/efi_loader/efi_gop.c          |   7 +
 test/dm/video.c                   | 256 ++++++++++++++++++++++++------
 20 files changed, 483 insertions(+), 297 deletions(-)


base-commit: 3881c9fbb7fdd98f6eae5cd33f7e9abe9455a585
-- 
2.40.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 01/13] video: test: Split copy frame buffer check into a function
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
@ 2023-08-21 13:50 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:50 ` [PATCH v5 02/13] video: test: Support checking copy frame buffer contents Alper Nebi Yasak
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:50 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

While checking frame buffer contents, the video tests also check if the
copy frame buffer contents match the main frame buffer. To test if only
the modified regions are updated after a sync, we will need to create
situations where the two are mismatched. Split this check into another
function that we can skip calling, since we won't want it to error on
those mismatched cases.

Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Add patch "video: test: Split copy frame buffer check into a function"

 test/dm/video.c | 69 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 58 insertions(+), 11 deletions(-)

diff --git a/test/dm/video.c b/test/dm/video.c
index d907f681600b..641a6250100a 100644
--- a/test/dm/video.c
+++ b/test/dm/video.c
@@ -55,9 +55,6 @@ DM_TEST(dm_test_video_base, UT_TESTF_SCAN_PDATA | UT_TESTF_SCAN_FDT);
  * size of the compressed data. This provides a pretty good level of
  * certainty and the resulting tests need only check a single value.
  *
- * If the copy framebuffer is enabled, this compares it to the main framebuffer
- * too.
- *
  * @uts:	Test state
  * @dev:	Video device
  * Return: compressed size of the frame buffer, or -ve on error
@@ -66,7 +63,6 @@ static int compress_frame_buffer(struct unit_test_state *uts,
 				 struct udevice *dev)
 {
 	struct video_priv *priv = dev_get_uclass_priv(dev);
-	struct video_priv *uc_priv = dev_get_uclass_priv(dev);
 	uint destlen;
 	void *dest;
 	int ret;
@@ -82,16 +78,34 @@ static int compress_frame_buffer(struct unit_test_state *uts,
 	if (ret)
 		return ret;
 
-	/* Check here that the copy frame buffer is working correctly */
-	if (IS_ENABLED(CONFIG_VIDEO_COPY)) {
-		ut_assertf(!memcmp(uc_priv->fb, uc_priv->copy_fb,
-				   uc_priv->fb_size),
-				   "Copy framebuffer does not match fb");
-	}
-
 	return destlen;
 }
 
+/**
+ * check_copy_frame_buffer() - Compare main frame buffer to copy
+ *
+ * If the copy frame buffer is enabled, this compares it to the main
+ * frame buffer. Normally they should have the same contents after a
+ * sync.
+ *
+ * @uts:	Test state
+ * @dev:	Video device
+ * Return: 0, or -ve on error
+ */
+static int check_copy_frame_buffer(struct unit_test_state *uts,
+				   struct udevice *dev)
+{
+	struct video_priv *priv = dev_get_uclass_priv(dev);
+
+	if (!IS_ENABLED(CONFIG_VIDEO_COPY))
+		return 0;
+
+	ut_assertf(!memcmp(priv->fb, priv->copy_fb, priv->fb_size),
+		   "Copy framebuffer does not match fb");
+
+	return 0;
+}
+
 /*
  * Call this function at any point to halt and show the current display. Be
  * sure to run the test with the -l flag.
@@ -155,24 +169,30 @@ static int dm_test_video_text(struct unit_test_state *uts)
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	ut_assertok(vidconsole_select_font(con, "8x16", 0));
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_putc_xy(con, 0, 0, 'a');
 	ut_asserteq(79, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	vidconsole_putc_xy(con, 0, 0, ' ');
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	for (i = 0; i < 20; i++)
 		vidconsole_putc_xy(con, VID_TO_POS(i * 8), 0, ' ' + i);
 	ut_asserteq(273, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	vidconsole_set_row(con, 0, WHITE);
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	for (i = 0; i < 20; i++)
 		vidconsole_putc_xy(con, VID_TO_POS(i * 8), 0, ' ' + i);
 	ut_asserteq(273, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -191,24 +211,30 @@ static int dm_test_video_text_12x22(struct unit_test_state *uts)
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	ut_assertok(vidconsole_select_font(con, "12x22", 0));
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_putc_xy(con, 0, 0, 'a');
 	ut_asserteq(89, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	vidconsole_putc_xy(con, 0, 0, ' ');
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	for (i = 0; i < 20; i++)
 		vidconsole_putc_xy(con, VID_TO_POS(i * 8), 0, ' ' + i);
 	ut_asserteq(363, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	vidconsole_set_row(con, 0, WHITE);
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	for (i = 0; i < 20; i++)
 		vidconsole_putc_xy(con, VID_TO_POS(i * 8), 0, ' ' + i);
 	ut_asserteq(363, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -226,6 +252,7 @@ static int dm_test_video_chars(struct unit_test_state *uts)
 	ut_assertok(vidconsole_select_font(con, "8x16", 0));
 	vidconsole_put_string(con, test_string);
 	ut_asserteq(466, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -247,19 +274,23 @@ static int dm_test_video_ansi(struct unit_test_state *uts)
 	video_clear(con->parent);
 	video_sync(con->parent, false);
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* test clear escape sequence: [2J */
 	vidconsole_put_string(con, "A\tB\tC"ANSI_ESC"[2J");
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* test set-cursor: [%d;%df */
 	vidconsole_put_string(con, "abc"ANSI_ESC"[2;2fab"ANSI_ESC"[4;4fcd");
 	ut_asserteq(143, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* test colors (30-37 fg color, 40-47 bg color) */
 	vidconsole_put_string(con, ANSI_ESC"[30;41mfoo"); /* black on red */
 	vidconsole_put_string(con, ANSI_ESC"[33;44mbar"); /* yellow on blue */
 	ut_asserteq(272, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -292,11 +323,13 @@ static int check_vidconsole_output(struct unit_test_state *uts, int rot,
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	ut_assertok(vidconsole_select_font(con, "8x16", 0));
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* Check display wrap */
 	for (i = 0; i < 120; i++)
 		vidconsole_put_char(con, 'A' + i % 50);
 	ut_asserteq(wrap_size, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* Check display scrolling */
 	for (i = 0; i < SCROLL_LINES; i++) {
@@ -304,11 +337,13 @@ static int check_vidconsole_output(struct unit_test_state *uts, int rot,
 		vidconsole_put_char(con, '\n');
 	}
 	ut_asserteq(scroll_size, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* If we scroll enough, the screen becomes blank again */
 	for (i = 0; i < SCROLL_LINES; i++)
 		vidconsole_put_char(con, '\n');
 	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -383,6 +418,7 @@ static int dm_test_video_bmp(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
 	ut_asserteq(1368, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -402,6 +438,7 @@ static int dm_test_video_bmp8(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
 	ut_asserteq(1247, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -425,6 +462,7 @@ static int dm_test_video_bmp16(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, dst, 0, 0, false));
 	ut_asserteq(3700, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -448,6 +486,7 @@ static int dm_test_video_bmp24(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, dst, 0, 0, false));
 	ut_asserteq(3656, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -471,6 +510,7 @@ static int dm_test_video_bmp24_32(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, dst, 0, 0, false));
 	ut_asserteq(6827, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -489,6 +529,7 @@ static int dm_test_video_bmp32(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
 	ut_asserteq(2024, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -505,6 +546,7 @@ static int dm_test_video_bmp_comp(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
 	ut_asserteq(1368, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -524,6 +566,7 @@ static int dm_test_video_comp_bmp32(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
 	ut_asserteq(2024, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -543,6 +586,7 @@ static int dm_test_video_comp_bmp8(struct unit_test_state *uts)
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
 	ut_asserteq(1247, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -558,6 +602,7 @@ static int dm_test_video_truetype(struct unit_test_state *uts)
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_put_string(con, test_string);
 	ut_asserteq(12174, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -579,6 +624,7 @@ static int dm_test_video_truetype_scroll(struct unit_test_state *uts)
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_put_string(con, test_string);
 	ut_asserteq(34287, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
@@ -600,6 +646,7 @@ static int dm_test_video_truetype_bs(struct unit_test_state *uts)
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_put_string(con, test_string);
 	ut_asserteq(29471, compress_frame_buffer(uts, dev));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 02/13] video: test: Support checking copy frame buffer contents
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
  2023-08-21 13:50 ` [PATCH v5 01/13] video: test: Split copy frame buffer check into a function Alper Nebi Yasak
@ 2023-08-21 13:50 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 03/13] video: test: Test partial updates of hardware frame buffer Alper Nebi Yasak
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:50 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

The video tests have a helper function to generate a pseudo-digest of
frame buffer contents, but it only does so for the main one. There is
another check that the copy frame buffer is the same as that. But
neither is enough to test if only the modified regions are copied to the
copy frame buffer, since we will want the two to be different in very
specific ways.

Add a boolean argument to the existing helper function to indicate which
frame buffer we want to inspect, and update the existing callers.

Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Add patch "video: test: Support checking copy frame buffer contents"

 test/dm/video.c | 76 ++++++++++++++++++++++++++-----------------------
 1 file changed, 41 insertions(+), 35 deletions(-)

diff --git a/test/dm/video.c b/test/dm/video.c
index 641a6250100a..b9ff3da10c18 100644
--- a/test/dm/video.c
+++ b/test/dm/video.c
@@ -57,22 +57,28 @@ DM_TEST(dm_test_video_base, UT_TESTF_SCAN_PDATA | UT_TESTF_SCAN_FDT);
  *
  * @uts:	Test state
  * @dev:	Video device
+ * @use_copy:	Use copy frame buffer if available
  * Return: compressed size of the frame buffer, or -ve on error
  */
 static int compress_frame_buffer(struct unit_test_state *uts,
-				 struct udevice *dev)
+				 struct udevice *dev,
+				 bool use_copy)
 {
 	struct video_priv *priv = dev_get_uclass_priv(dev);
 	uint destlen;
 	void *dest;
 	int ret;
 
+	if (!IS_ENABLED(CONFIG_VIDEO_COPY))
+		use_copy = false;
+
 	destlen = priv->fb_size;
 	dest = malloc(priv->fb_size);
 	if (!dest)
 		return -ENOMEM;
 	ret = BZ2_bzBuffToBuffCompress(dest, &destlen,
-				       priv->fb, priv->fb_size,
+				       use_copy ? priv->copy_fb : priv->fb,
+				       priv->fb_size,
 				       3, 0, 0);
 	free(dest);
 	if (ret)
@@ -168,30 +174,30 @@ static int dm_test_video_text(struct unit_test_state *uts)
 	ut_assertok(video_get_nologo(uts, &dev));
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	ut_assertok(vidconsole_select_font(con, "8x16", 0));
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_putc_xy(con, 0, 0, 'a');
-	ut_asserteq(79, compress_frame_buffer(uts, dev));
+	ut_asserteq(79, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	vidconsole_putc_xy(con, 0, 0, ' ');
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	for (i = 0; i < 20; i++)
 		vidconsole_putc_xy(con, VID_TO_POS(i * 8), 0, ' ' + i);
-	ut_asserteq(273, compress_frame_buffer(uts, dev));
+	ut_asserteq(273, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	vidconsole_set_row(con, 0, WHITE);
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	for (i = 0; i < 20; i++)
 		vidconsole_putc_xy(con, VID_TO_POS(i * 8), 0, ' ' + i);
-	ut_asserteq(273, compress_frame_buffer(uts, dev));
+	ut_asserteq(273, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -210,30 +216,30 @@ static int dm_test_video_text_12x22(struct unit_test_state *uts)
 	ut_assertok(video_get_nologo(uts, &dev));
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	ut_assertok(vidconsole_select_font(con, "12x22", 0));
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_putc_xy(con, 0, 0, 'a');
-	ut_asserteq(89, compress_frame_buffer(uts, dev));
+	ut_asserteq(89, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	vidconsole_putc_xy(con, 0, 0, ' ');
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	for (i = 0; i < 20; i++)
 		vidconsole_putc_xy(con, VID_TO_POS(i * 8), 0, ' ' + i);
-	ut_asserteq(363, compress_frame_buffer(uts, dev));
+	ut_asserteq(363, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	vidconsole_set_row(con, 0, WHITE);
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	for (i = 0; i < 20; i++)
 		vidconsole_putc_xy(con, VID_TO_POS(i * 8), 0, ' ' + i);
-	ut_asserteq(363, compress_frame_buffer(uts, dev));
+	ut_asserteq(363, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -251,7 +257,7 @@ static int dm_test_video_chars(struct unit_test_state *uts)
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	ut_assertok(vidconsole_select_font(con, "8x16", 0));
 	vidconsole_put_string(con, test_string);
-	ut_asserteq(466, compress_frame_buffer(uts, dev));
+	ut_asserteq(466, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -273,23 +279,23 @@ static int dm_test_video_ansi(struct unit_test_state *uts)
 	/* reference clear: */
 	video_clear(con->parent);
 	video_sync(con->parent, false);
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* test clear escape sequence: [2J */
 	vidconsole_put_string(con, "A\tB\tC"ANSI_ESC"[2J");
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* test set-cursor: [%d;%df */
 	vidconsole_put_string(con, "abc"ANSI_ESC"[2;2fab"ANSI_ESC"[4;4fcd");
-	ut_asserteq(143, compress_frame_buffer(uts, dev));
+	ut_asserteq(143, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* test colors (30-37 fg color, 40-47 bg color) */
 	vidconsole_put_string(con, ANSI_ESC"[30;41mfoo"); /* black on red */
 	vidconsole_put_string(con, ANSI_ESC"[33;44mbar"); /* yellow on blue */
-	ut_asserteq(272, compress_frame_buffer(uts, dev));
+	ut_asserteq(272, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -322,13 +328,13 @@ static int check_vidconsole_output(struct unit_test_state *uts, int rot,
 	ut_assertok(video_get_nologo(uts, &dev));
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	ut_assertok(vidconsole_select_font(con, "8x16", 0));
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* Check display wrap */
 	for (i = 0; i < 120; i++)
 		vidconsole_put_char(con, 'A' + i % 50);
-	ut_asserteq(wrap_size, compress_frame_buffer(uts, dev));
+	ut_asserteq(wrap_size, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* Check display scrolling */
@@ -336,13 +342,13 @@ static int check_vidconsole_output(struct unit_test_state *uts, int rot,
 		vidconsole_put_char(con, 'A' + i % 50);
 		vidconsole_put_char(con, '\n');
 	}
-	ut_asserteq(scroll_size, compress_frame_buffer(uts, dev));
+	ut_asserteq(scroll_size, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	/* If we scroll enough, the screen becomes blank again */
 	for (i = 0; i < SCROLL_LINES; i++)
 		vidconsole_put_char(con, '\n');
-	ut_asserteq(46, compress_frame_buffer(uts, dev));
+	ut_asserteq(46, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -417,7 +423,7 @@ static int dm_test_video_bmp(struct unit_test_state *uts)
 	ut_assertok(read_file(uts, "tools/logos/denx.bmp", &addr));
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
-	ut_asserteq(1368, compress_frame_buffer(uts, dev));
+	ut_asserteq(1368, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -437,7 +443,7 @@ static int dm_test_video_bmp8(struct unit_test_state *uts)
 	ut_assertok(read_file(uts, "tools/logos/denx.bmp", &addr));
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
-	ut_asserteq(1247, compress_frame_buffer(uts, dev));
+	ut_asserteq(1247, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -461,7 +467,7 @@ static int dm_test_video_bmp16(struct unit_test_state *uts)
 			   &src_len));
 
 	ut_assertok(video_bmp_display(dev, dst, 0, 0, false));
-	ut_asserteq(3700, compress_frame_buffer(uts, dev));
+	ut_asserteq(3700, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -485,7 +491,7 @@ static int dm_test_video_bmp24(struct unit_test_state *uts)
 			   &src_len));
 
 	ut_assertok(video_bmp_display(dev, dst, 0, 0, false));
-	ut_asserteq(3656, compress_frame_buffer(uts, dev));
+	ut_asserteq(3656, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -509,7 +515,7 @@ static int dm_test_video_bmp24_32(struct unit_test_state *uts)
 			   &src_len));
 
 	ut_assertok(video_bmp_display(dev, dst, 0, 0, false));
-	ut_asserteq(6827, compress_frame_buffer(uts, dev));
+	ut_asserteq(6827, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -528,7 +534,7 @@ static int dm_test_video_bmp32(struct unit_test_state *uts)
 	ut_assertok(read_file(uts, "tools/logos/denx.bmp", &addr));
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
-	ut_asserteq(2024, compress_frame_buffer(uts, dev));
+	ut_asserteq(2024, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -545,7 +551,7 @@ static int dm_test_video_bmp_comp(struct unit_test_state *uts)
 	ut_assertok(read_file(uts, "tools/logos/denx-comp.bmp", &addr));
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
-	ut_asserteq(1368, compress_frame_buffer(uts, dev));
+	ut_asserteq(1368, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -565,7 +571,7 @@ static int dm_test_video_comp_bmp32(struct unit_test_state *uts)
 	ut_assertok(read_file(uts, "tools/logos/denx.bmp", &addr));
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
-	ut_asserteq(2024, compress_frame_buffer(uts, dev));
+	ut_asserteq(2024, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -585,7 +591,7 @@ static int dm_test_video_comp_bmp8(struct unit_test_state *uts)
 	ut_assertok(read_file(uts, "tools/logos/denx.bmp", &addr));
 
 	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
-	ut_asserteq(1247, compress_frame_buffer(uts, dev));
+	ut_asserteq(1247, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -601,7 +607,7 @@ static int dm_test_video_truetype(struct unit_test_state *uts)
 	ut_assertok(video_get_nologo(uts, &dev));
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_put_string(con, test_string);
-	ut_asserteq(12174, compress_frame_buffer(uts, dev));
+	ut_asserteq(12174, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -623,7 +629,7 @@ static int dm_test_video_truetype_scroll(struct unit_test_state *uts)
 	ut_assertok(video_get_nologo(uts, &dev));
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_put_string(con, test_string);
-	ut_asserteq(34287, compress_frame_buffer(uts, dev));
+	ut_asserteq(34287, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
@@ -645,7 +651,7 @@ static int dm_test_video_truetype_bs(struct unit_test_state *uts)
 	ut_assertok(video_get_nologo(uts, &dev));
 	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
 	vidconsole_put_string(con, test_string);
-	ut_asserteq(29471, compress_frame_buffer(uts, dev));
+	ut_asserteq(29471, compress_frame_buffer(uts, dev, false));
 	ut_assertok(check_copy_frame_buffer(uts, dev));
 
 	return 0;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 03/13] video: test: Test partial updates of hardware frame buffer
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
  2023-08-21 13:50 ` [PATCH v5 01/13] video: test: Split copy frame buffer check into a function Alper Nebi Yasak
  2023-08-21 13:50 ` [PATCH v5 02/13] video: test: Support checking copy frame buffer contents Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 04/13] dm: video: Add damage tracking API Alper Nebi Yasak
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

With VIDEO_COPY enabled, only the modified parts of the frame buffer are
intended to be copied to the hardware. Add a test that checks this, by
overwriting contents we prepared without telling the video uclass and
then checking if the overwritten contents have been redrawn on the next
sync.

Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Add patch "video: test: Test partial updates of hardware frame buffer"

 test/dm/video.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/test/dm/video.c b/test/dm/video.c
index b9ff3da10c18..e4bd27a6b76f 100644
--- a/test/dm/video.c
+++ b/test/dm/video.c
@@ -657,3 +657,57 @@ static int dm_test_video_truetype_bs(struct unit_test_state *uts)
 	return 0;
 }
 DM_TEST(dm_test_video_truetype_bs, UT_TESTF_SCAN_PDATA | UT_TESTF_SCAN_FDT);
+
+/* Test partial rendering onto hardware frame buffer */
+static int dm_test_video_copy(struct unit_test_state *uts)
+{
+	struct sandbox_sdl_plat *plat;
+	struct video_uc_plat *uc_plat;
+	struct udevice *dev, *con;
+	struct video_priv *priv;
+	const char *test_string = "\n\tCriticism may not be agreeable, but it is necessary.\t";
+	ulong addr;
+
+	if (!IS_ENABLED(CONFIG_VIDEO_COPY))
+		return -EAGAIN;
+
+	ut_assertok(uclass_find_first_device(UCLASS_VIDEO, &dev));
+	ut_assertnonnull(dev);
+	uc_plat = dev_get_uclass_plat(dev);
+	uc_plat->hide_logo = true;
+	plat = dev_get_plat(dev);
+	plat->font_size = 32;
+	ut_assert(!device_active(dev));
+	ut_assertok(uclass_first_device_err(UCLASS_VIDEO, &dev));
+	ut_assertnonnull(dev);
+	priv = dev_get_uclass_priv(dev);
+
+	ut_assertok(read_file(uts, "tools/logos/denx.bmp", &addr));
+	ut_assertok(video_bmp_display(dev, addr, 0, 0, false));
+
+	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
+	vidconsole_put_string(con, "\n\n\n\n\n");
+	vidconsole_put_string(con, test_string);
+	vidconsole_put_string(con, test_string);
+
+	ut_asserteq(6678, compress_frame_buffer(uts, dev, false));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
+
+	/*
+	 * Secretly clear the hardware frame buffer, but in a different
+	 * color (black) to see which parts will be overwritten.
+	 */
+	memset(priv->copy_fb, 0, priv->fb_size);
+
+	/*
+	 * We should have the full content on the main buffer, but only
+	 * the new content should have been copied to the copy buffer.
+	 */
+	vidconsole_put_string(con, test_string);
+	vidconsole_put_string(con, test_string);
+	ut_asserteq(7589, compress_frame_buffer(uts, dev, false));
+	ut_asserteq(5278, compress_frame_buffer(uts, dev, true));
+
+	return 0;
+}
+DM_TEST(dm_test_video_copy, UT_TESTF_SCAN_PDATA | UT_TESTF_SCAN_FDT);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 04/13] dm: video: Add damage tracking API
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (2 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 03/13] video: test: Test partial updates of hardware frame buffer Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 05/13] dm: video: Add damage notification on display fills Alper Nebi Yasak
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

We are going to introduce image damage tracking to fasten up screen
refresh on large displays. This patch adds damage tracking for up to
one rectangle of the screen which is typically enough to hold blt or
text print updates. Callers into this API and a reduced dcache flush
code path will follow in later patches.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
[Alper: Use xstart/yend, document new fields, return void from
        video_damage(), declare priv, drop headers, use IS_ENABLED()]
Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Use xstart, ystart, xend, yend as names for damage region
- Document damage struct and fields in struct video_priv comment
- Return void from video_damage()
- Fix undeclared priv error in video_sync()
- Drop unused headers from video-uclass.c
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()

Changes in v4:
- Move damage clear to patch "dm: video: Add damage tracking API"
- Simplify first damage logic
- Remove VIDEO_DAMAGE default for ARM

Changes in v3:
- Adapt to always assume DM is used

Changes in v2:
- Remove ifdefs

 drivers/video/Kconfig        | 13 ++++++++++++
 drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++++---
 include/video.h              | 32 ++++++++++++++++++++++++++--
 3 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index fe43fbd7004a..97f494a1340b 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -92,6 +92,19 @@ config VIDEO_COPY
 	  To use this, your video driver must set @copy_base in
 	  struct video_uc_plat.
 
+config VIDEO_DAMAGE
+	bool "Enable damage tracking of frame buffer regions"
+	help
+	  On some machines (most ARM), the display frame buffer resides in
+	  RAM. To make the display controller pick up screen updates, we
+	  have to flush frame buffer contents from CPU caches into RAM which
+	  can be a slow operation.
+
+	  This feature adds damage tracking to collect information about regions
+	  that received updates. When we want to sync, we then only flush
+	  regions of the frame buffer that were modified before, speeding up
+	  screen refreshes significantly.
+
 config BACKLIGHT_PWM
 	bool "Generic PWM based Backlight Driver"
 	depends on BACKLIGHT && DM_PWM
diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index 8f268fc4063f..447689581668 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -351,9 +351,39 @@ void video_set_default_colors(struct udevice *dev, bool invert)
 	priv->colour_bg = video_index_to_colour(priv, back);
 }
 
+/* Notify about changes in the frame buffer */
+void video_damage(struct udevice *vid, int x, int y, int width, int height)
+{
+	struct video_priv *priv = dev_get_uclass_priv(vid);
+	int xend = x + width;
+	int yend = y + height;
+
+	if (!IS_ENABLED(CONFIG_VIDEO_DAMAGE))
+		return;
+
+	if (x > priv->xsize)
+		return;
+
+	if (y > priv->ysize)
+		return;
+
+	if (xend > priv->xsize)
+		xend = priv->xsize;
+
+	if (yend > priv->ysize)
+		yend = priv->ysize;
+
+	/* Span a rectangle across all old and new damage */
+	priv->damage.xstart = min(x, priv->damage.xstart);
+	priv->damage.ystart = min(y, priv->damage.ystart);
+	priv->damage.xend = max(xend, priv->damage.xend);
+	priv->damage.yend = max(yend, priv->damage.yend);
+}
+
 /* Flush video activity to the caches */
 int video_sync(struct udevice *vid, bool force)
 {
+	struct video_priv *priv = dev_get_uclass_priv(vid);
 	struct video_ops *ops = video_get_ops(vid);
 	int ret;
 
@@ -369,15 +399,12 @@ int video_sync(struct udevice *vid, bool force)
 	 * out whether it exists? For now, ARM is safe.
 	 */
 #if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
-	struct video_priv *priv = dev_get_uclass_priv(vid);
-
 	if (priv->flush_dcache) {
 		flush_dcache_range((ulong)priv->fb,
 				   ALIGN((ulong)priv->fb + priv->fb_size,
 					 CONFIG_SYS_CACHELINE_SIZE));
 	}
 #elif defined(CONFIG_VIDEO_SANDBOX_SDL)
-	struct video_priv *priv = dev_get_uclass_priv(vid);
 	static ulong last_sync;
 
 	if (force || get_timer(last_sync) > 100) {
@@ -385,6 +412,14 @@ int video_sync(struct udevice *vid, bool force)
 		last_sync = get_timer(0);
 	}
 #endif
+
+	if (IS_ENABLED(CONFIG_VIDEO_DAMAGE)) {
+		priv->damage.xstart = priv->xsize;
+		priv->damage.ystart = priv->ysize;
+		priv->damage.xend = 0;
+		priv->damage.yend = 0;
+	}
+
 	return 0;
 }
 
diff --git a/include/video.h b/include/video.h
index 66d109ca5da6..a522f33949e5 100644
--- a/include/video.h
+++ b/include/video.h
@@ -85,6 +85,11 @@ enum video_format {
  * @fb_size:	Frame buffer size
  * @copy_fb:	Copy of the frame buffer to keep up to date; see struct
  *		video_uc_plat
+ * @damage:	A bounding box of framebuffer regions updated since last sync
+ * @damage.xstart:	X start position in pixels from the left
+ * @damage.ystart:	Y start position in pixels from the top
+ * @damage.xend:	X end position in pixels from the left
+ * @damage.xend:	Y end position in pixels from the top
  * @line_length:	Length of each frame buffer line, in bytes. This can be
  *		set by the driver, but if not, the uclass will set it after
  *		probing
@@ -112,6 +117,12 @@ struct video_priv {
 	void *fb;
 	int fb_size;
 	void *copy_fb;
+	struct {
+		int xstart;
+		int ystart;
+		int xend;
+		int yend;
+	} damage;
 	int line_length;
 	u32 colour_fg;
 	u32 colour_bg;
@@ -254,8 +265,9 @@ int video_fill_part(struct udevice *dev, int xstart, int ystart, int xend,
  * @return: 0 on success, error code otherwise
  *
  * Some frame buffers are cached or have a secondary frame buffer. This
- * function syncs these up so that the current contents of the U-Boot frame
- * buffer are displayed to the user.
+ * function syncs the damaged parts of them up so that the current contents
+ * of the U-Boot frame buffer are displayed to the user. It clears the damage
+ * buffer.
  */
 int video_sync(struct udevice *vid, bool force);
 
@@ -375,6 +387,22 @@ static inline int video_sync_copy_all(struct udevice *dev)
 
 #endif
 
+/**
+ * video_damage() - Notify the video subsystem about screen updates.
+ *
+ * @vid:	Device to sync
+ * @x:	        Upper left X coordinate of the damaged rectangle
+ * @y:	        Upper left Y coordinate of the damaged rectangle
+ * @width:	Width of the damaged rectangle
+ * @height:	Height of the damaged rectangle
+ *
+ * Some frame buffers are cached or have a secondary frame buffer. This
+ * function notifies the video subsystem about rectangles that were updated
+ * within the frame buffer. They may only get written to the screen on the
+ * next call to video_sync().
+ */
+void video_damage(struct udevice *vid, int x, int y, int width, int height);
+
 /**
  * video_is_active() - Test if one video device it active
  *
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 05/13] dm: video: Add damage notification on display fills
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (3 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 04/13] dm: video: Add damage tracking API Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 06/13] vidconsole: Add damage notifications to all vidconsole drivers Alper Nebi Yasak
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

Let's report the video damage when we fill parts of the screen. This
way we can later lazily flush only relevant regions to hardware.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
[Alper: Call video_damage() in video_fill_part(), edit commit message]
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---
Does video_fill_part() need a video_sync(dev, false) here?

Changes in v5:
- Call video_damage() also in video_fill_part()

 drivers/video/video-uclass.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index 447689581668..ebf409d839f0 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -203,6 +203,8 @@ int video_fill_part(struct udevice *dev, int xstart, int ystart, int xend,
 	if (ret)
 		return ret;
 
+	video_damage(dev, xstart, ystart, xend - xstart, yend - ystart);
+
 	return 0;
 }
 
@@ -249,6 +251,8 @@ int video_fill(struct udevice *dev, u32 colour)
 	if (ret)
 		return ret;
 
+	video_damage(dev, 0, 0, priv->xsize, priv->ysize);
+
 	return video_sync(dev, false);
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 06/13] vidconsole: Add damage notifications to all vidconsole drivers
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (4 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 05/13] dm: video: Add damage notification on display fills Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 07/13] video: test: Test video damage tracking via vidconsole Alper Nebi Yasak
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

Now that we have a damage tracking API, let's populate damage done by
vidconsole drivers. We try to declare as little memory as damaged as
possible.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
[Alper: Rebase for met->baseline, fontdata->height/width, make rotated
        console_putc_xy() damages pass tests, edit patch message]
Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Use met->baseline instead of priv->baseline
- Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
- Update console_rotate.c video_damage() calls to pass video tests
- Remove mention about not having minimal damage for console_rotate.c

Changes in v2:
- Fix ranges in truetype target
- Limit rotate to necessary damage

 drivers/video/console_normal.c   | 18 +++++++++++
 drivers/video/console_rotate.c   | 54 ++++++++++++++++++++++++++++++++
 drivers/video/console_truetype.c | 21 +++++++++++++
 drivers/video/video-uclass.c     |  1 +
 4 files changed, 94 insertions(+)

diff --git a/drivers/video/console_normal.c b/drivers/video/console_normal.c
index 413c7abee9e1..a19ce6a2bc11 100644
--- a/drivers/video/console_normal.c
+++ b/drivers/video/console_normal.c
@@ -39,6 +39,12 @@ static int console_set_row(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     0,
+		     fontdata->height * row,
+		     vid_priv->xsize,
+		     fontdata->height);
+
 	return 0;
 }
 
@@ -60,6 +66,12 @@ static int console_move_rows(struct udevice *dev, uint rowdst,
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     0,
+		     fontdata->height * rowdst,
+		     vid_priv->xsize,
+		     fontdata->height * count);
+
 	return 0;
 }
 
@@ -90,6 +102,12 @@ static int console_putc_xy(struct udevice *dev, uint x_frac, uint y, char ch)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     x,
+		     y,
+		     fontdata->width,
+		     fontdata->height);
+
 	ret = vidconsole_sync_copy(dev, start, line);
 	if (ret)
 		return ret;
diff --git a/drivers/video/console_rotate.c b/drivers/video/console_rotate.c
index 65358a1c6e74..6c3e7c1bb8dc 100644
--- a/drivers/video/console_rotate.c
+++ b/drivers/video/console_rotate.c
@@ -36,6 +36,12 @@ static int console_set_row_1(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     vid_priv->xsize - ((row + 1) * fontdata->height),
+		     0,
+		     fontdata->height,
+		     vid_priv->ysize);
+
 	return 0;
 }
 
@@ -64,6 +70,12 @@ static int console_move_rows_1(struct udevice *dev, uint rowdst, uint rowsrc,
 		dst += vid_priv->line_length;
 	}
 
+	video_damage(dev->parent,
+		     vid_priv->xsize - ((rowdst + count) * fontdata->height),
+		     0,
+		     count * fontdata->height,
+		     vid_priv->ysize);
+
 	return 0;
 }
 
@@ -96,6 +108,12 @@ static int console_putc_xy_1(struct udevice *dev, uint x_frac, uint y, char ch)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     vid_priv->xsize - y - fontdata->height,
+		     linenum - 1,
+		     fontdata->height,
+		     fontdata->width);
+
 	return VID_TO_POS(fontdata->width);
 }
 
@@ -121,6 +139,12 @@ static int console_set_row_2(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     0,
+		     vid_priv->ysize - (row + 1) * fontdata->height,
+		     vid_priv->xsize,
+		     fontdata->height);
+
 	return 0;
 }
 
@@ -142,6 +166,12 @@ static int console_move_rows_2(struct udevice *dev, uint rowdst, uint rowsrc,
 	vidconsole_memmove(dev, dst, src,
 			   fontdata->height * vid_priv->line_length * count);
 
+	video_damage(dev->parent,
+		     0,
+		     vid_priv->ysize - (rowdst + count) * fontdata->height,
+		     vid_priv->xsize,
+		     count * fontdata->height);
+
 	return 0;
 }
 
@@ -174,6 +204,12 @@ static int console_putc_xy_2(struct udevice *dev, uint x_frac, uint y, char ch)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     x - fontdata->width + 1,
+		     linenum - fontdata->height + 1,
+		     fontdata->width,
+		     fontdata->height);
+
 	return VID_TO_POS(fontdata->width);
 }
 
@@ -198,6 +234,12 @@ static int console_set_row_3(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     row * fontdata->height,
+		     0,
+		     fontdata->height,
+		     vid_priv->ysize);
+
 	return 0;
 }
 
@@ -224,6 +266,12 @@ static int console_move_rows_3(struct udevice *dev, uint rowdst, uint rowsrc,
 		dst += vid_priv->line_length;
 	}
 
+	video_damage(dev->parent,
+		     rowdst * fontdata->height,
+		     0,
+		     count * fontdata->height,
+		     vid_priv->ysize);
+
 	return 0;
 }
 
@@ -255,6 +303,12 @@ static int console_putc_xy_3(struct udevice *dev, uint x_frac, uint y, char ch)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     y,
+		     linenum - fontdata->width + 1,
+		     fontdata->height,
+		     fontdata->width);
+
 	return VID_TO_POS(fontdata->width);
 }
 
diff --git a/drivers/video/console_truetype.c b/drivers/video/console_truetype.c
index 0f9bb49e44f7..0adbf9cc3d67 100644
--- a/drivers/video/console_truetype.c
+++ b/drivers/video/console_truetype.c
@@ -178,6 +178,7 @@ struct console_tt_priv {
 static int console_truetype_set_row(struct udevice *dev, uint row, int clr)
 {
 	struct video_priv *vid_priv = dev_get_uclass_priv(dev->parent);
+	struct vidconsole_priv *vc_priv = dev_get_uclass_priv(dev);
 	struct console_tt_priv *priv = dev_get_priv(dev);
 	struct console_tt_metrics *met = priv->cur_met;
 	void *end, *line;
@@ -221,6 +222,12 @@ static int console_truetype_set_row(struct udevice *dev, uint row, int clr)
 	if (ret)
 		return ret;
 
+	video_damage(dev->parent,
+		     0,
+		     vc_priv->y_charsize * row,
+		     vid_priv->xsize,
+		     vc_priv->y_charsize);
+
 	return 0;
 }
 
@@ -228,6 +235,7 @@ static int console_truetype_move_rows(struct udevice *dev, uint rowdst,
 				     uint rowsrc, uint count)
 {
 	struct video_priv *vid_priv = dev_get_uclass_priv(dev->parent);
+	struct vidconsole_priv *vc_priv = dev_get_uclass_priv(dev);
 	struct console_tt_priv *priv = dev_get_priv(dev);
 	struct console_tt_metrics *met = priv->cur_met;
 	void *dst;
@@ -246,6 +254,12 @@ static int console_truetype_move_rows(struct udevice *dev, uint rowdst,
 	for (i = 0; i < priv->pos_ptr; i++)
 		priv->pos[i].ypos -= diff;
 
+	video_damage(dev->parent,
+		     0,
+		     vc_priv->y_charsize * rowdst,
+		     vid_priv->xsize,
+		     vc_priv->y_charsize * count);
+
 	return 0;
 }
 
@@ -403,6 +417,13 @@ static int console_truetype_putc_xy(struct udevice *dev, uint x, uint y,
 
 		line += vid_priv->line_length;
 	}
+
+	video_damage(dev->parent,
+		     VID_TO_PIXEL(x) + xoff,
+		     y + met->baseline + yoff,
+		     width,
+		     height);
+
 	ret = vidconsole_sync_copy(dev, start, line);
 	if (ret)
 		return ret;
diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index ebf409d839f0..8bfcbc88dda7 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -199,6 +199,7 @@ int video_fill_part(struct udevice *dev, int xstart, int ystart, int xend,
 		}
 		line += priv->line_length;
 	}
+
 	ret = video_sync_copy(dev, start, line);
 	if (ret)
 		return ret;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 07/13] video: test: Test video damage tracking via vidconsole
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (5 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 06/13] vidconsole: Add damage notifications to all vidconsole drivers Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 08/13] video: Add damage notification on bmp display Alper Nebi Yasak
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

With VIDEO_DAMAGE, the video uclass tracks updated regions of the frame
buffer in order to avoid unnecessary work during a video sync. Enable
the config in sandbox and add a test for it, by printing strings at a
few locations and checking the tracked region.

Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---
This is hard to test because most things issue video syncs that process
and reset the damaged region.

Changes in v5:
- Add patch "video: test: Test video damage tracking via vidconsole"

 configs/sandbox_defconfig |  1 +
 test/dm/video.c           | 56 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/configs/sandbox_defconfig b/configs/sandbox_defconfig
index 259f31f26cee..51b820f13121 100644
--- a/configs/sandbox_defconfig
+++ b/configs/sandbox_defconfig
@@ -307,6 +307,7 @@ CONFIG_USB_ETH_CDC=y
 CONFIG_VIDEO=y
 CONFIG_VIDEO_FONT_SUN12X22=y
 CONFIG_VIDEO_COPY=y
+CONFIG_VIDEO_DAMAGE=y
 CONFIG_CONSOLE_ROTATION=y
 CONFIG_CONSOLE_TRUETYPE=y
 CONFIG_CONSOLE_TRUETYPE_CANTORAONE=y
diff --git a/test/dm/video.c b/test/dm/video.c
index e4bd27a6b76f..8c7d9800a42e 100644
--- a/test/dm/video.c
+++ b/test/dm/video.c
@@ -711,3 +711,59 @@ static int dm_test_video_copy(struct unit_test_state *uts)
 	return 0;
 }
 DM_TEST(dm_test_video_copy, UT_TESTF_SCAN_PDATA | UT_TESTF_SCAN_FDT);
+
+/* Test video damage tracking */
+static int dm_test_video_damage(struct unit_test_state *uts)
+{
+	struct sandbox_sdl_plat *plat;
+	struct udevice *dev, *con;
+	struct video_priv *priv;
+	const char *test_string_1 = "Criticism may not be agreeable, ";
+	const char *test_string_2 = "but it is necessary.";
+	const char *test_string_3 = "It fulfils the same function as pain in the human body.";
+
+	if (!IS_ENABLED(CONFIG_VIDEO_DAMAGE))
+		return -EAGAIN;
+
+	ut_assertok(uclass_find_device(UCLASS_VIDEO, 0, &dev));
+	ut_assert(!device_active(dev));
+	plat = dev_get_plat(dev);
+	plat->font_size = 32;
+
+	ut_assertok(video_get_nologo(uts, &dev));
+	ut_assertok(uclass_get_device(UCLASS_VIDEO_CONSOLE, 0, &con));
+	priv = dev_get_uclass_priv(dev);
+
+	vidconsole_position_cursor(con, 14, 10);
+	vidconsole_put_string(con, test_string_2);
+	ut_asserteq(449, priv->damage.xstart);
+	ut_asserteq(325, priv->damage.ystart);
+	ut_asserteq(661, priv->damage.xend);
+	ut_asserteq(350, priv->damage.yend);
+
+	vidconsole_position_cursor(con, 7, 5);
+	vidconsole_put_string(con, test_string_1);
+	ut_asserteq(225, priv->damage.xstart);
+	ut_asserteq(164, priv->damage.ystart);
+	ut_asserteq(661, priv->damage.xend);
+	ut_asserteq(350, priv->damage.yend);
+
+	vidconsole_position_cursor(con, 21, 15);
+	vidconsole_put_string(con, test_string_3);
+	ut_asserteq(225, priv->damage.xstart);
+	ut_asserteq(164, priv->damage.ystart);
+	ut_asserteq(1280, priv->damage.xend);
+	ut_asserteq(510, priv->damage.yend);
+
+	video_sync(dev, false);
+	ut_asserteq(priv->xsize, priv->damage.xstart);
+	ut_asserteq(priv->ysize, priv->damage.ystart);
+	ut_asserteq(0, priv->damage.xend);
+	ut_asserteq(0, priv->damage.yend);
+
+	ut_asserteq(7339, compress_frame_buffer(uts, dev, false));
+	ut_assertok(check_copy_frame_buffer(uts, dev));
+
+	return 0;
+}
+DM_TEST(dm_test_video_damage, UT_TESTF_SCAN_PDATA | UT_TESTF_SCAN_FDT);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 08/13] video: Add damage notification on bmp display
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (6 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 07/13] video: test: Test video damage tracking via vidconsole Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 13:51 ` [PATCH v5 09/13] efi_loader: GOP: Add damage notification on BLT Alper Nebi Yasak
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

Let's report the video damage when we draw a bitmap on the screen. This
way we can later lazily flush only relevant regions to hardware.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
Reviewed-by: Simon Glass <sjg@chromium.org>
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

(no changes since v1)

 drivers/video/video_bmp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/video/video_bmp.c b/drivers/video/video_bmp.c
index 45f003c8251a..10943b9ca19f 100644
--- a/drivers/video/video_bmp.c
+++ b/drivers/video/video_bmp.c
@@ -460,6 +460,8 @@ int video_bmp_display(struct udevice *dev, ulong bmp_image, int x, int y,
 		break;
 	};
 
+	video_damage(dev, x, y, width, height);
+
 	/* Find the position of the top left of the image in the framebuffer */
 	fb = (uchar *)(priv->fb + y * priv->line_length + x * bpix / 8);
 	ret = video_sync_copy(dev, start, fb);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 09/13] efi_loader: GOP: Add damage notification on BLT
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (7 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 08/13] video: Add damage notification on bmp display Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 10/13] video: Only dcache flush damaged lines Alper Nebi Yasak
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

Now that we have a damage tracking API, let's populate damage done by
UEFI payloads when they BLT data onto the screen.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
[Alper: Add struct comment for new member]
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Document new vdev field in struct efi_gop_obj comment

Changes in v4:
- Skip damage on EfiBltVideoToBltBuffer

Changes in v3:
- Adapt to always assume DM is used

Changes in v2:
- Remove ifdefs from gop

 lib/efi_loader/efi_gop.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/efi_loader/efi_gop.c b/lib/efi_loader/efi_gop.c
index 778b693f983a..db6535e080c4 100644
--- a/lib/efi_loader/efi_gop.c
+++ b/lib/efi_loader/efi_gop.c
@@ -24,6 +24,7 @@ static const efi_guid_t efi_gop_guid = EFI_GRAPHICS_OUTPUT_PROTOCOL_GUID;
  * @ops:	graphical output protocol interface
  * @info:	graphical output mode information
  * @mode:	graphical output mode
+ * @vdev:	backing video device
  * @bpix:	bits per pixel
  * @fb:		frame buffer
  */
@@ -32,6 +33,7 @@ struct efi_gop_obj {
 	struct efi_gop ops;
 	struct efi_gop_mode_info info;
 	struct efi_gop_mode mode;
+	struct udevice *vdev;
 	/* Fields we only have access to during init */
 	u32 bpix;
 	void *fb;
@@ -120,6 +122,7 @@ static __always_inline efi_status_t gop_blt_int(struct efi_gop *this,
 	u32 *fb32 = gopobj->fb;
 	u16 *fb16 = gopobj->fb;
 	struct efi_gop_pixel *buffer = __builtin_assume_aligned(bufferp, 4);
+	bool blt_to_video = (operation != EFI_BLT_VIDEO_TO_BLT_BUFFER);
 
 	if (delta) {
 		/* Check for 4 byte alignment */
@@ -243,6 +246,9 @@ static __always_inline efi_status_t gop_blt_int(struct efi_gop *this,
 		dlineoff += dwidth;
 	}
 
+	if (blt_to_video)
+		video_damage(gopobj->vdev, dx, dy, width, height);
+
 	return EFI_SUCCESS;
 }
 
@@ -548,6 +554,7 @@ efi_status_t efi_gop_register(void)
 	gopobj->info.pixels_per_scanline = col;
 	gopobj->bpix = bpix;
 	gopobj->fb = fb;
+	gopobj->vdev = vdev;
 
 	return EFI_SUCCESS;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 10/13] video: Only dcache flush damaged lines
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (8 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 09/13] efi_loader: GOP: Add damage notification on BLT Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY Alper Nebi Yasak
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

Now that we have a damage area tells us which parts of the frame buffer
actually need updating, let's only dcache flush those on video_sync()
calls. With this optimization in place, frame buffer updates - especially
on large screen such as 4k displays - speed up significantly.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reported-by: Da Xue <da@libre.computer>
[Alper: Use damage.xstart/yend, IS_ENABLED()]
Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Use xstart, ystart, xend, yend as names for damage region
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()

Changes in v2:
- Fix dcache range; we were flushing too much before
- Remove ifdefs

 drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index 8bfcbc88dda7..a50220bcc684 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -385,6 +385,41 @@ void video_damage(struct udevice *vid, int x, int y, int width, int height)
 	priv->damage.yend = max(yend, priv->damage.yend);
 }
 
+#if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
+static void video_flush_dcache(struct udevice *vid)
+{
+	struct video_priv *priv = dev_get_uclass_priv(vid);
+
+	if (!priv->flush_dcache)
+		return;
+
+	if (!IS_ENABLED(CONFIG_VIDEO_DAMAGE)) {
+		flush_dcache_range((ulong)priv->fb,
+				   ALIGN((ulong)priv->fb + priv->fb_size,
+					 CONFIG_SYS_CACHELINE_SIZE));
+
+		return;
+	}
+
+	if (priv->damage.xend && priv->damage.yend) {
+		int lstart = priv->damage.xstart * VNBYTES(priv->bpix);
+		int lend = priv->damage.xend * VNBYTES(priv->bpix);
+		int y;
+
+		for (y = priv->damage.ystart; y < priv->damage.yend; y++) {
+			ulong fb = (ulong)priv->fb;
+			ulong start = fb + (y * priv->line_length) + lstart;
+			ulong end = start + lend - lstart;
+
+			start = ALIGN_DOWN(start, CONFIG_SYS_CACHELINE_SIZE);
+			end = ALIGN(end, CONFIG_SYS_CACHELINE_SIZE);
+
+			flush_dcache_range(start, end);
+		}
+	}
+}
+#endif
+
 /* Flush video activity to the caches */
 int video_sync(struct udevice *vid, bool force)
 {
@@ -404,11 +439,7 @@ int video_sync(struct udevice *vid, bool force)
 	 * out whether it exists? For now, ARM is safe.
 	 */
 #if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
-	if (priv->flush_dcache) {
-		flush_dcache_range((ulong)priv->fb,
-				   ALIGN((ulong)priv->fb + priv->fb_size,
-					 CONFIG_SYS_CACHELINE_SIZE));
-	}
+	video_flush_dcache(vid);
 #elif defined(CONFIG_VIDEO_SANDBOX_SDL)
 	static ulong last_sync;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (9 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 10/13] video: Only dcache flush damaged lines Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 13:51 ` [PATCH v5 12/13] video: Always compile cache flushing code Alper Nebi Yasak
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

CONFIG_VIDEO_COPY implemented a range-based copying mechanism: If we
print a single character, it will always copy the full range of bytes
from the top left corner of the character to the lower right onto the
uncached frame buffer. This includes pretty much the full line contents
of the printed character.

Since we now have proper damage tracking, let's make use of that to reduce
the amount of data we need to copy. With this patch applied, we will only
copy the tiny rectangle surrounding characters when we print them,
speeding up the video console.

After this, changes to the main frame buffer are not immediately copied
to the copy frame buffer, but postponed until the next video device
sync. So issue an explicit sync before inspecting the copy frame buffer
contents for the video tests.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
[Alper: Rebase for fontdata->height/w, fill_part(), fix memmove(dev),
        drop from defconfig, use damage.xstart/yend, use IS_ENABLED(),
        call video_sync() before copy_fb check, update video_copy test]
Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Remove video_sync_copy() also from video_fill(), video_fill_part()
- Fix memmove() calls by removing the extra dev argument
- Call video_sync() before checking copy_fb in video tests
- Use xstart, ystart, xend, yend as names for damage region
- Use met->baseline instead of priv->baseline
- Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
- Use xstart, ystart, xend, yend as names for damage region
- Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
- Drop VIDEO_DAMAGE from sandbox defconfig added in a new patch
- Update dm_test_video_copy test added in a new patch

Changes in v3:
- Make VIDEO_COPY always select VIDEO_DAMAGE

Changes in v2:
- Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"

 configs/sandbox_defconfig         |  1 -
 drivers/video/Kconfig             |  5 ++
 drivers/video/console_normal.c    | 13 +----
 drivers/video/console_rotate.c    | 44 +++-----------
 drivers/video/console_truetype.c  | 16 +----
 drivers/video/vidconsole-uclass.c | 16 -----
 drivers/video/video-uclass.c      | 97 ++++++++-----------------------
 drivers/video/video_bmp.c         |  7 ---
 include/video.h                   | 37 ------------
 include/video_console.h           | 52 -----------------
 test/dm/video.c                   |  3 +-
 11 files changed, 43 insertions(+), 248 deletions(-)

diff --git a/configs/sandbox_defconfig b/configs/sandbox_defconfig
index 51b820f13121..259f31f26cee 100644
--- a/configs/sandbox_defconfig
+++ b/configs/sandbox_defconfig
@@ -307,7 +307,6 @@ CONFIG_USB_ETH_CDC=y
 CONFIG_VIDEO=y
 CONFIG_VIDEO_FONT_SUN12X22=y
 CONFIG_VIDEO_COPY=y
-CONFIG_VIDEO_DAMAGE=y
 CONFIG_CONSOLE_ROTATION=y
 CONFIG_CONSOLE_TRUETYPE=y
 CONFIG_CONSOLE_TRUETYPE_CANTORAONE=y
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index 97f494a1340b..b3fbd9d7d9ca 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -83,11 +83,14 @@ config VIDEO_PCI_DEFAULT_FB_SIZE
 
 config VIDEO_COPY
 	bool "Enable copying the frame buffer to a hardware copy"
+	select VIDEO_DAMAGE
 	help
 	  On some machines (e.g. x86), reading from the frame buffer is very
 	  slow because it is uncached. To improve performance, this feature
 	  allows the frame buffer to be kept in cached memory (allocated by
 	  U-Boot) and then copied to the hardware frame-buffer as needed.
+	  It uses the VIDEO_DAMAGE feature to keep track of regions to copy
+	  and will only copy actually touched regions.
 
 	  To use this, your video driver must set @copy_base in
 	  struct video_uc_plat.
@@ -105,6 +108,8 @@ config VIDEO_DAMAGE
 	  regions of the frame buffer that were modified before, speeding up
 	  screen refreshes significantly.
 
+	  It is also used by VIDEO_COPY to identify which regions changed.
+
 config BACKLIGHT_PWM
 	bool "Generic PWM based Backlight Driver"
 	depends on BACKLIGHT && DM_PWM
diff --git a/drivers/video/console_normal.c b/drivers/video/console_normal.c
index a19ce6a2bc11..c44aa09473a3 100644
--- a/drivers/video/console_normal.c
+++ b/drivers/video/console_normal.c
@@ -35,10 +35,6 @@ static int console_set_row(struct udevice *dev, uint row, int clr)
 		fill_pixel_and_goto_next(&dst, clr, pbytes, pbytes);
 	end = dst;
 
-	ret = vidconsole_sync_copy(dev, line, end);
-	if (ret)
-		return ret;
-
 	video_damage(dev->parent,
 		     0,
 		     fontdata->height * row,
@@ -57,14 +53,11 @@ static int console_move_rows(struct udevice *dev, uint rowdst,
 	void *dst;
 	void *src;
 	int size;
-	int ret;
 
 	dst = vid_priv->fb + rowdst * fontdata->height * vid_priv->line_length;
 	src = vid_priv->fb + rowsrc * fontdata->height * vid_priv->line_length;
 	size = fontdata->height * vid_priv->line_length * count;
-	ret = vidconsole_memmove(dev, dst, src, size);
-	if (ret)
-		return ret;
+	memmove(dst, src, size);
 
 	video_damage(dev->parent,
 		     0,
@@ -108,10 +101,6 @@ static int console_putc_xy(struct udevice *dev, uint x_frac, uint y, char ch)
 		     fontdata->width,
 		     fontdata->height);
 
-	ret = vidconsole_sync_copy(dev, start, line);
-	if (ret)
-		return ret;
-
 	return VID_TO_POS(fontdata->width);
 }
 
diff --git a/drivers/video/console_rotate.c b/drivers/video/console_rotate.c
index 6c3e7c1bb8dc..6e9067d1c7fb 100644
--- a/drivers/video/console_rotate.c
+++ b/drivers/video/console_rotate.c
@@ -21,7 +21,6 @@ static int console_set_row_1(struct udevice *dev, uint row, int clr)
 	int pbytes = VNBYTES(vid_priv->bpix);
 	void *start, *dst, *line;
 	int i, j;
-	int ret;
 
 	start = vid_priv->fb + vid_priv->line_length -
 		(row + 1) * fontdata->height * pbytes;
@@ -32,9 +31,6 @@ static int console_set_row_1(struct udevice *dev, uint row, int clr)
 			fill_pixel_and_goto_next(&dst, clr, pbytes, pbytes);
 		line += vid_priv->line_length;
 	}
-	ret = vidconsole_sync_copy(dev, start, line);
-	if (ret)
-		return ret;
 
 	video_damage(dev->parent,
 		     vid_priv->xsize - ((row + 1) * fontdata->height),
@@ -54,7 +50,7 @@ static int console_move_rows_1(struct udevice *dev, uint rowdst, uint rowsrc,
 	int pbytes = VNBYTES(vid_priv->bpix);
 	void *dst;
 	void *src;
-	int j, ret;
+	int j;
 
 	dst = vid_priv->fb + vid_priv->line_length -
 		(rowdst + count) * fontdata->height * pbytes;
@@ -62,10 +58,7 @@ static int console_move_rows_1(struct udevice *dev, uint rowdst, uint rowsrc,
 		(rowsrc + count) * fontdata->height * pbytes;
 
 	for (j = 0; j < vid_priv->ysize; j++) {
-		ret = vidconsole_memmove(dev, dst, src,
-					fontdata->height * pbytes * count);
-		if (ret)
-			return ret;
+		memmove(dst, src, fontdata->height * pbytes * count);
 		src += vid_priv->line_length;
 		dst += vid_priv->line_length;
 	}
@@ -104,10 +97,6 @@ static int console_putc_xy_1(struct udevice *dev, uint x_frac, uint y, char ch)
 		return ret;
 
 	/* We draw backwards from 'start, so account for the first line */
-	ret = vidconsole_sync_copy(dev, start - vid_priv->line_length, line);
-	if (ret)
-		return ret;
-
 	video_damage(dev->parent,
 		     vid_priv->xsize - y - fontdata->height,
 		     linenum - 1,
@@ -125,7 +114,7 @@ static int console_set_row_2(struct udevice *dev, uint row, int clr)
 	struct video_fontdata *fontdata = priv->fontdata;
 	void *start, *line, *dst, *end;
 	int pixels = fontdata->height * vid_priv->xsize;
-	int i, ret;
+	int i;
 	int pbytes = VNBYTES(vid_priv->bpix);
 
 	start = vid_priv->fb + vid_priv->ysize * vid_priv->line_length -
@@ -135,9 +124,6 @@ static int console_set_row_2(struct udevice *dev, uint row, int clr)
 	for (i = 0; i < pixels; i++)
 		fill_pixel_and_goto_next(&dst, clr, pbytes, pbytes);
 	end = dst;
-	ret = vidconsole_sync_copy(dev, start, end);
-	if (ret)
-		return ret;
 
 	video_damage(dev->parent,
 		     0,
@@ -163,8 +149,7 @@ static int console_move_rows_2(struct udevice *dev, uint rowdst, uint rowsrc,
 		vid_priv->line_length;
 	src = end - (rowsrc + count) * fontdata->height *
 		vid_priv->line_length;
-	vidconsole_memmove(dev, dst, src,
-			   fontdata->height * vid_priv->line_length * count);
+	memmove(dst, src, fontdata->height * vid_priv->line_length * count);
 
 	video_damage(dev->parent,
 		     0,
@@ -199,11 +184,6 @@ static int console_putc_xy_2(struct udevice *dev, uint x_frac, uint y, char ch)
 	if (ret)
 		return ret;
 
-	/* Add 4 bytes to allow for the first pixel writen */
-	ret = vidconsole_sync_copy(dev, start + 4, line);
-	if (ret)
-		return ret;
-
 	video_damage(dev->parent,
 		     x - fontdata->width + 1,
 		     linenum - fontdata->height + 1,
@@ -220,7 +200,7 @@ static int console_set_row_3(struct udevice *dev, uint row, int clr)
 	struct video_fontdata *fontdata = priv->fontdata;
 	int pbytes = VNBYTES(vid_priv->bpix);
 	void *start, *dst, *line;
-	int i, j, ret;
+	int i, j;
 
 	start = vid_priv->fb + row * fontdata->height * pbytes;
 	line = start;
@@ -230,9 +210,6 @@ static int console_set_row_3(struct udevice *dev, uint row, int clr)
 			fill_pixel_and_goto_next(&dst, clr, pbytes, pbytes);
 		line += vid_priv->line_length;
 	}
-	ret = vidconsole_sync_copy(dev, start, line);
-	if (ret)
-		return ret;
 
 	video_damage(dev->parent,
 		     row * fontdata->height,
@@ -252,16 +229,13 @@ static int console_move_rows_3(struct udevice *dev, uint rowdst, uint rowsrc,
 	int pbytes = VNBYTES(vid_priv->bpix);
 	void *dst;
 	void *src;
-	int j, ret;
+	int j;
 
 	dst = vid_priv->fb + rowdst * fontdata->height * pbytes;
 	src = vid_priv->fb + rowsrc * fontdata->height * pbytes;
 
 	for (j = 0; j < vid_priv->ysize; j++) {
-		ret = vidconsole_memmove(dev, dst, src,
-					fontdata->height * pbytes * count);
-		if (ret)
-			return ret;
+		memmove(dst, src, fontdata->height * pbytes * count);
 		src += vid_priv->line_length;
 		dst += vid_priv->line_length;
 	}
@@ -296,10 +270,6 @@ static int console_putc_xy_3(struct udevice *dev, uint x_frac, uint y, char ch)
 	line = start;
 
 	ret = fill_char_horizontally(pfont, &line, vid_priv, fontdata, NORMAL_DIRECTION);
-	if (ret)
-		return ret;
-	/* Add a line to allow for the first pixels writen */
-	ret = vidconsole_sync_copy(dev, start + vid_priv->line_length, line);
 	if (ret)
 		return ret;
 
diff --git a/drivers/video/console_truetype.c b/drivers/video/console_truetype.c
index 0adbf9cc3d67..07bb0af71311 100644
--- a/drivers/video/console_truetype.c
+++ b/drivers/video/console_truetype.c
@@ -182,7 +182,6 @@ static int console_truetype_set_row(struct udevice *dev, uint row, int clr)
 	struct console_tt_priv *priv = dev_get_priv(dev);
 	struct console_tt_metrics *met = priv->cur_met;
 	void *end, *line;
-	int ret;
 
 	line = vid_priv->fb + row * met->font_size * vid_priv->line_length;
 	end = line + met->font_size * vid_priv->line_length;
@@ -218,9 +217,6 @@ static int console_truetype_set_row(struct udevice *dev, uint row, int clr)
 	default:
 		return -ENOSYS;
 	}
-	ret = vidconsole_sync_copy(dev, line, end);
-	if (ret)
-		return ret;
 
 	video_damage(dev->parent,
 		     0,
@@ -240,14 +236,11 @@ static int console_truetype_move_rows(struct udevice *dev, uint rowdst,
 	struct console_tt_metrics *met = priv->cur_met;
 	void *dst;
 	void *src;
-	int i, diff, ret;
+	int i, diff;
 
 	dst = vid_priv->fb + rowdst * met->font_size * vid_priv->line_length;
 	src = vid_priv->fb + rowsrc * met->font_size * vid_priv->line_length;
-	ret = vidconsole_memmove(dev, dst, src, met->font_size *
-				 vid_priv->line_length * count);
-	if (ret)
-		return ret;
+	memmove(dst, src, met->font_size * vid_priv->line_length * count);
 
 	/* Scroll up our position history */
 	diff = (rowsrc - rowdst) * met->font_size;
@@ -280,7 +273,7 @@ static int console_truetype_putc_xy(struct udevice *dev, uint x, uint y,
 	u8 *bits, *data;
 	int advance;
 	void *start, *end, *line;
-	int row, ret;
+	int row;
 
 	/* First get some basic metrics about this character */
 	stbtt_GetCodepointHMetrics(font, ch, &advance, &lsb);
@@ -424,9 +417,6 @@ static int console_truetype_putc_xy(struct udevice *dev, uint x, uint y,
 		     width,
 		     height);
 
-	ret = vidconsole_sync_copy(dev, start, line);
-	if (ret)
-		return ret;
 	free(data);
 
 	return width_frac;
diff --git a/drivers/video/vidconsole-uclass.c b/drivers/video/vidconsole-uclass.c
index 05f930478096..27a1e8ec3e49 100644
--- a/drivers/video/vidconsole-uclass.c
+++ b/drivers/video/vidconsole-uclass.c
@@ -682,22 +682,6 @@ UCLASS_DRIVER(vidconsole) = {
 	.per_device_auto	= sizeof(struct vidconsole_priv),
 };
 
-#ifdef CONFIG_VIDEO_COPY
-int vidconsole_sync_copy(struct udevice *dev, void *from, void *to)
-{
-	struct udevice *vid = dev_get_parent(dev);
-
-	return video_sync_copy(vid, from, to);
-}
-
-int vidconsole_memmove(struct udevice *dev, void *dst, const void *src,
-		       int size)
-{
-	memmove(dst, src, size);
-	return vidconsole_sync_copy(dev, dst, dst + size);
-}
-#endif
-
 int vidconsole_clear_and_reset(struct udevice *dev)
 {
 	int ret;
diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index a50220bcc684..c79499252a22 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -160,7 +160,7 @@ int video_fill_part(struct udevice *dev, int xstart, int ystart, int xend,
 	struct video_priv *priv = dev_get_uclass_priv(dev);
 	void *start, *line;
 	int pixels = xend - xstart;
-	int row, i, ret;
+	int row, i;
 
 	start = priv->fb + ystart * priv->line_length;
 	start += xstart * VNBYTES(priv->bpix);
@@ -200,10 +200,6 @@ int video_fill_part(struct udevice *dev, int xstart, int ystart, int xend,
 		line += priv->line_length;
 	}
 
-	ret = video_sync_copy(dev, start, line);
-	if (ret)
-		return ret;
-
 	video_damage(dev, xstart, ystart, xend - xstart, yend - ystart);
 
 	return 0;
@@ -223,7 +219,6 @@ int video_reserve_from_bloblist(struct video_handoff *ho)
 int video_fill(struct udevice *dev, u32 colour)
 {
 	struct video_priv *priv = dev_get_uclass_priv(dev);
-	int ret;
 
 	switch (priv->bpix) {
 	case VIDEO_BPP16:
@@ -248,9 +243,6 @@ int video_fill(struct udevice *dev, u32 colour)
 		memset(priv->fb, colour, priv->fb_size);
 		break;
 	}
-	ret = video_sync_copy(dev, priv->fb, priv->fb + priv->fb_size);
-	if (ret)
-		return ret;
 
 	video_damage(dev, 0, 0, priv->xsize, priv->ysize);
 
@@ -420,6 +412,27 @@ static void video_flush_dcache(struct udevice *vid)
 }
 #endif
 
+static void video_flush_copy(struct udevice *vid)
+{
+	struct video_priv *priv = dev_get_uclass_priv(vid);
+
+	if (!priv->copy_fb)
+		return;
+
+	if (priv->damage.xend && priv->damage.yend) {
+		int lstart = priv->damage.xstart * VNBYTES(priv->bpix);
+		int lend = priv->damage.xend * VNBYTES(priv->bpix);
+		int y;
+
+		for (y = priv->damage.ystart; y < priv->damage.yend; y++) {
+			ulong offset = (y * priv->line_length) + lstart;
+			ulong len = lend - lstart;
+
+			memcpy(priv->copy_fb + offset, priv->fb + offset, len);
+		}
+	}
+}
+
 /* Flush video activity to the caches */
 int video_sync(struct udevice *vid, bool force)
 {
@@ -427,6 +440,9 @@ int video_sync(struct udevice *vid, bool force)
 	struct video_ops *ops = video_get_ops(vid);
 	int ret;
 
+	if (IS_ENABLED(CONFIG_VIDEO_COPY))
+		video_flush_copy(vid);
+
 	if (ops && ops->video_sync) {
 		ret = ops->video_sync(vid);
 		if (ret)
@@ -503,69 +519,6 @@ int video_get_ysize(struct udevice *dev)
 	return priv->ysize;
 }
 
-#ifdef CONFIG_VIDEO_COPY
-int video_sync_copy(struct udevice *dev, void *from, void *to)
-{
-	struct video_priv *priv = dev_get_uclass_priv(dev);
-
-	if (priv->copy_fb) {
-		long offset, size;
-
-		/* Find the offset of the first byte to copy */
-		if ((ulong)to > (ulong)from) {
-			size = to - from;
-			offset = from - priv->fb;
-		} else {
-			size = from - to;
-			offset = to - priv->fb;
-		}
-
-		/*
-		 * Allow a bit of leeway for valid requests somewhere near the
-		 * frame buffer
-		 */
-		if (offset < -priv->fb_size || offset > 2 * priv->fb_size) {
-#ifdef DEBUG
-			char str[120];
-
-			snprintf(str, sizeof(str),
-				 "[** FAULT sync_copy fb=%p, from=%p, to=%p, offset=%lx]",
-				 priv->fb, from, to, offset);
-			console_puts_select_stderr(true, str);
-#endif
-			return -EFAULT;
-		}
-
-		/*
-		 * Silently crop the memcpy. This allows callers to avoid doing
-		 * this themselves. It is common for the end pointer to go a
-		 * few lines after the end of the frame buffer, since most of
-		 * the update algorithms terminate a line after their last write
-		 */
-		if (offset + size > priv->fb_size) {
-			size = priv->fb_size - offset;
-		} else if (offset < 0) {
-			size += offset;
-			offset = 0;
-		}
-
-		memcpy(priv->copy_fb + offset, priv->fb + offset, size);
-	}
-
-	return 0;
-}
-
-int video_sync_copy_all(struct udevice *dev)
-{
-	struct video_priv *priv = dev_get_uclass_priv(dev);
-
-	video_sync_copy(dev, priv->fb, priv->fb + priv->fb_size);
-
-	return 0;
-}
-
-#endif
-
 #define SPLASH_DECL(_name) \
 	extern u8 __splash_ ## _name ## _begin[]; \
 	extern u8 __splash_ ## _name ## _end[]
diff --git a/drivers/video/video_bmp.c b/drivers/video/video_bmp.c
index 10943b9ca19f..da2bbe864a03 100644
--- a/drivers/video/video_bmp.c
+++ b/drivers/video/video_bmp.c
@@ -268,7 +268,6 @@ int video_bmp_display(struct udevice *dev, ulong bmp_image, int x, int y,
 	enum video_format eformat;
 	struct bmp_color_table_entry *palette;
 	int hdr_size;
-	int ret;
 
 	if (!bmp || !(bmp->header.signature[0] == 'B' &&
 	    bmp->header.signature[1] == 'M')) {
@@ -462,11 +461,5 @@ int video_bmp_display(struct udevice *dev, ulong bmp_image, int x, int y,
 
 	video_damage(dev, x, y, width, height);
 
-	/* Find the position of the top left of the image in the framebuffer */
-	fb = (uchar *)(priv->fb + y * priv->line_length + x * bpix / 8);
-	ret = video_sync_copy(dev, start, fb);
-	if (ret)
-		return log_ret(ret);
-
 	return video_sync(dev, false);
 }
diff --git a/include/video.h b/include/video.h
index a522f33949e5..42e57b44188d 100644
--- a/include/video.h
+++ b/include/video.h
@@ -350,43 +350,6 @@ void video_set_default_colors(struct udevice *dev, bool invert);
  */
 int video_default_font_height(struct udevice *dev);
 
-#ifdef CONFIG_VIDEO_COPY
-/**
- * vidconsole_sync_copy() - Sync back to the copy framebuffer
- *
- * This ensures that the copy framebuffer has the same data as the framebuffer
- * for a particular region. It should be called after the framebuffer is updated
- *
- * @from and @to can be in either order. The region between them is synced.
- *
- * @dev: Vidconsole device being updated
- * @from: Start/end address within the framebuffer (->fb)
- * @to: Other address within the frame buffer
- * Return: 0 if OK, -EFAULT if the start address is before the start of the
- *	frame buffer start
- */
-int video_sync_copy(struct udevice *dev, void *from, void *to);
-
-/**
- * video_sync_copy_all() - Sync the entire framebuffer to the copy
- *
- * @dev: Vidconsole device being updated
- * Return: 0 (always)
- */
-int video_sync_copy_all(struct udevice *dev);
-#else
-static inline int video_sync_copy(struct udevice *dev, void *from, void *to)
-{
-	return 0;
-}
-
-static inline int video_sync_copy_all(struct udevice *dev)
-{
-	return 0;
-}
-
-#endif
-
 /**
  * video_damage() - Notify the video subsystem about screen updates.
  *
diff --git a/include/video_console.h b/include/video_console.h
index 2694e44f6ecf..caadeb878989 100644
--- a/include/video_console.h
+++ b/include/video_console.h
@@ -404,56 +404,4 @@ void vidconsole_list_fonts(struct udevice *dev);
  */
 int vidconsole_get_font_size(struct udevice *dev, const char **name, uint *sizep);
 
-#ifdef CONFIG_VIDEO_COPY
-/**
- * vidconsole_sync_copy() - Sync back to the copy framebuffer
- *
- * This ensures that the copy framebuffer has the same data as the framebuffer
- * for a particular region. It should be called after the framebuffer is updated
- *
- * @from and @to can be in either order. The region between them is synced.
- *
- * @dev: Vidconsole device being updated
- * @from: Start/end address within the framebuffer (->fb)
- * @to: Other address within the frame buffer
- * Return: 0 if OK, -EFAULT if the start address is before the start of the
- *	frame buffer start
- */
-int vidconsole_sync_copy(struct udevice *dev, void *from, void *to);
-
-/**
- * vidconsole_memmove() - Perform a memmove() within the frame buffer
- *
- * This handles a memmove(), e.g. for scrolling. It also updates the copy
- * framebuffer.
- *
- * @dev: Vidconsole device being updated
- * @dst: Destination address within the framebuffer (->fb)
- * @src: Source address within the framebuffer (->fb)
- * @size: Number of bytes to transfer
- * Return: 0 if OK, -EFAULT if the start address is before the start of the
- *	frame buffer start
- */
-int vidconsole_memmove(struct udevice *dev, void *dst, const void *src,
-		       int size);
-#else
-
-#include <string.h>
-
-static inline int vidconsole_sync_copy(struct udevice *dev, void *from,
-				       void *to)
-{
-	return 0;
-}
-
-static inline int vidconsole_memmove(struct udevice *dev, void *dst,
-				     const void *src, int size)
-{
-	memmove(dst, src, size);
-
-	return 0;
-}
-
-#endif
-
 #endif
diff --git a/test/dm/video.c b/test/dm/video.c
index 8c7d9800a42e..4c3bcd26e94f 100644
--- a/test/dm/video.c
+++ b/test/dm/video.c
@@ -106,6 +106,7 @@ static int check_copy_frame_buffer(struct unit_test_state *uts,
 	if (!IS_ENABLED(CONFIG_VIDEO_COPY))
 		return 0;
 
+	video_sync(dev, false);
 	ut_assertf(!memcmp(priv->fb, priv->copy_fb, priv->fb_size),
 		   "Copy framebuffer does not match fb");
 
@@ -706,7 +707,7 @@ static int dm_test_video_copy(struct unit_test_state *uts)
 	vidconsole_put_string(con, test_string);
 	vidconsole_put_string(con, test_string);
 	ut_asserteq(7589, compress_frame_buffer(uts, dev, false));
-	ut_asserteq(5278, compress_frame_buffer(uts, dev, true));
+	ut_asserteq(4127, compress_frame_buffer(uts, dev, true));
 
 	return 0;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 12/13] video: Always compile cache flushing code
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (10 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 13:51 ` [PATCH v5 13/13] video: Enable VIDEO_DAMAGE for drivers that need it Alper Nebi Yasak
  2023-08-21 19:11 ` [PATCH v5 00/13] Add video damage tracking Simon Glass
  13 siblings, 0 replies; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

The dcache flushing code path was conditional on ARM && !DCACHE config
options. However, dcaches exist on other platforms as well and may need
clearing if their driver requires it.

Simplify the compile logic and always enable the dcache flush logic in
the video core. That way, drivers can always rely on it to call the arch
specific callbacks.

This will increase code size for non-ARM platforms with CONFIG_VIDEO=y
slightly.

Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Simon Glass <sjg@chromium.org>
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

(no changes since v4)

Changes in v4:
- Add patch "video: Always compile cache flushing code"

 drivers/video/video-uclass.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index c79499252a22..3f9ddaadd15d 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -377,11 +377,13 @@ void video_damage(struct udevice *vid, int x, int y, int width, int height)
 	priv->damage.yend = max(yend, priv->damage.yend);
 }
 
-#if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
 static void video_flush_dcache(struct udevice *vid)
 {
 	struct video_priv *priv = dev_get_uclass_priv(vid);
 
+	if (CONFIG_IS_ENABLED(SYS_DCACHE_OFF))
+		return;
+
 	if (!priv->flush_dcache)
 		return;
 
@@ -410,7 +412,6 @@ static void video_flush_dcache(struct udevice *vid)
 		}
 	}
 }
-#endif
 
 static void video_flush_copy(struct udevice *vid)
 {
@@ -449,14 +450,9 @@ int video_sync(struct udevice *vid, bool force)
 			return ret;
 	}
 
-	/*
-	 * flush_dcache_range() is declared in common.h but it seems that some
-	 * architectures do not actually implement it. Is there a way to find
-	 * out whether it exists? For now, ARM is safe.
-	 */
-#if defined(CONFIG_ARM) && !CONFIG_IS_ENABLED(SYS_DCACHE_OFF)
 	video_flush_dcache(vid);
-#elif defined(CONFIG_VIDEO_SANDBOX_SDL)
+
+#if defined(CONFIG_VIDEO_SANDBOX_SDL)
 	static ulong last_sync;
 
 	if (force || get_timer(last_sync) > 100) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v5 13/13] video: Enable VIDEO_DAMAGE for drivers that need it
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (11 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 12/13] video: Always compile cache flushing code Alper Nebi Yasak
@ 2023-08-21 13:51 ` Alper Nebi Yasak
  2023-08-21 19:11   ` Simon Glass
  2023-08-21 19:11 ` [PATCH v5 00/13] Add video damage tracking Simon Glass
  13 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-21 13:51 UTC (permalink / raw)
  To: u-boot
  Cc: Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Simon Glass,
	Matthias Brugger, u-boot-amlogic, Ilias Apalodimas,
	Neil Armstrong, Alper Nebi Yasak

From: Alexander Graf <agraf@csgraf.de>

Some drivers call video_set_flush_dcache() to indicate that they want to
have the dcache flushed for the frame buffer. These drivers benefit from
our new video damage control, because we can reduce the amount of memory
that gets flushed significantly.

This patch enables video damage control for all device drivers that call
video_set_flush_dcache() to make sure they benefit from it.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
[Alper: Add to VIDEO_TIDSS, imply instead of select]
Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
---

Changes in v5:
- Imply VIDEO_DAMAGE for video drivers instead of selecting it
- Imply VIDEO_DAMAGE also for VIDEO_TIDSS

Changes in v4:
- Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"

 arch/arm/mach-omap2/omap3/Kconfig | 1 +
 arch/arm/mach-sunxi/Kconfig       | 1 +
 drivers/video/Kconfig             | 8 ++++++++
 drivers/video/exynos/Kconfig      | 1 +
 drivers/video/imx/Kconfig         | 1 +
 drivers/video/meson/Kconfig       | 1 +
 drivers/video/rockchip/Kconfig    | 1 +
 drivers/video/stm32/Kconfig       | 1 +
 drivers/video/tegra20/Kconfig     | 1 +
 drivers/video/tidss/Kconfig       | 1 +
 10 files changed, 17 insertions(+)

diff --git a/arch/arm/mach-omap2/omap3/Kconfig b/arch/arm/mach-omap2/omap3/Kconfig
index 671e4791c67f..fd858f7b50f2 100644
--- a/arch/arm/mach-omap2/omap3/Kconfig
+++ b/arch/arm/mach-omap2/omap3/Kconfig
@@ -113,6 +113,7 @@ config TARGET_NOKIA_RX51
 	select CMDLINE_TAG
 	select INITRD_TAG
 	select REVISION_TAG
+	imply VIDEO_DAMAGE
 
 config TARGET_TAO3530
 	bool "TAO3530"
diff --git a/arch/arm/mach-sunxi/Kconfig b/arch/arm/mach-sunxi/Kconfig
index 9d5df2c10273..fb4478ea32e8 100644
--- a/arch/arm/mach-sunxi/Kconfig
+++ b/arch/arm/mach-sunxi/Kconfig
@@ -813,6 +813,7 @@ config VIDEO_SUNXI
 	depends on !SUN50I_GEN_H6
 	select VIDEO
 	select DISPLAY
+	imply VIDEO_DAMAGE
 	imply VIDEO_DT_SIMPLEFB
 	default y
 	---help---
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index b3fbd9d7d9ca..185dbb1f8390 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -499,6 +499,7 @@ config VIDEO_LCD_ANX9804
 
 config ATMEL_LCD
 	bool "Atmel LCD panel support"
+	imply VIDEO_DAMAGE
 	depends on ARCH_AT91
 
 config ATMEL_LCD_BGR555
@@ -508,6 +509,7 @@ config ATMEL_LCD_BGR555
 
 config VIDEO_BCM2835
 	bool "Display support for BCM2835"
+	imply VIDEO_DAMAGE
 	help
 	  The graphics processor already sets up the display so this driver
 	  simply checks the resolution and then sets up the frame buffer with
@@ -654,6 +656,7 @@ source "drivers/video/meson/Kconfig"
 
 config VIDEO_MVEBU
 	bool "Armada XP LCD controller"
+	imply VIDEO_DAMAGE
 	---help---
 	Support for the LCD controller integrated in the Marvell
 	Armada XP SoC.
@@ -688,6 +691,7 @@ config NXP_TDA19988
 
 config ATMEL_HLCD
 	bool "Enable ATMEL video support using HLCDC"
+	imply VIDEO_DAMAGE
 	help
 	   HLCDC supports video output to an attached LCD panel.
 
@@ -764,6 +768,7 @@ source "drivers/video/tidss/Kconfig"
 
 config VIDEO_TEGRA124
 	bool "Enable video support on Tegra124"
+	imply VIDEO_DAMAGE
 	help
 	   Tegra124 supports many video output options including eDP and
 	   HDMI. At present only eDP is supported by U-Boot. This option
@@ -778,6 +783,7 @@ source "drivers/video/imx/Kconfig"
 
 config VIDEO_MXS
 	bool "Enable video support on i.MX28/i.MX6UL/i.MX7 SoCs"
+	imply VIDEO_DAMAGE
 	help
 	  Enable framebuffer driver for i.MX28/i.MX6UL/i.MX7 processors
 
@@ -840,6 +846,7 @@ config VIDEO_DW_MIPI_DSI
 
 config VIDEO_SIMPLE
 	bool "Simple display driver for preconfigured display"
+	imply VIDEO_DAMAGE
 	help
 	  Enables a simple generic display driver which utilizes the
 	  simple-framebuffer devicetree bindings.
@@ -858,6 +865,7 @@ config VIDEO_DT_SIMPLEFB
 
 config VIDEO_MCDE_SIMPLE
 	bool "Simple driver for ST-Ericsson MCDE with preconfigured display"
+	imply VIDEO_DAMAGE
 	help
 	  Enables a simple display driver for ST-Ericsson MCDE
 	  (Multichannel Display Engine), which reads the configuration from
diff --git a/drivers/video/exynos/Kconfig b/drivers/video/exynos/Kconfig
index 599d19d5ecc2..a2cf752aac03 100644
--- a/drivers/video/exynos/Kconfig
+++ b/drivers/video/exynos/Kconfig
@@ -12,6 +12,7 @@ config EXYNOS_DP
 
 config EXYNOS_FB
 	bool "Exynos FIMD support"
+	imply VIDEO_DAMAGE
 
 config EXYNOS_MIPI_DSIM
 	bool "Exynos MIPI DSI support"
diff --git a/drivers/video/imx/Kconfig b/drivers/video/imx/Kconfig
index 34e8b640595b..5db3e5c0499e 100644
--- a/drivers/video/imx/Kconfig
+++ b/drivers/video/imx/Kconfig
@@ -2,6 +2,7 @@
 config VIDEO_IPUV3
 	bool "i.MX IPUv3 Core video support"
 	depends on VIDEO && (MX5 || MX6)
+	imply VIDEO_DAMAGE
 	help
 	  This enables framebuffer driver for i.MX processors working
 	  on the IPUv3(Image Processing Unit) internal graphic processor.
diff --git a/drivers/video/meson/Kconfig b/drivers/video/meson/Kconfig
index 3c2d72d019b8..fcf486ca0a3a 100644
--- a/drivers/video/meson/Kconfig
+++ b/drivers/video/meson/Kconfig
@@ -8,5 +8,6 @@ config VIDEO_MESON
 	bool "Enable Amlogic Meson video support"
 	depends on VIDEO
 	select DISPLAY
+	imply VIDEO_DAMAGE
 	help
 	  Enable Amlogic Meson Video Processing Unit video support.
diff --git a/drivers/video/rockchip/Kconfig b/drivers/video/rockchip/Kconfig
index 01804dcb1cc8..0f4550a29e38 100644
--- a/drivers/video/rockchip/Kconfig
+++ b/drivers/video/rockchip/Kconfig
@@ -11,6 +11,7 @@
 menuconfig VIDEO_ROCKCHIP
 	bool "Enable Rockchip Video Support"
 	depends on VIDEO
+	imply VIDEO_DAMAGE
 	help
 	  Rockchip SoCs provide video output capabilities for High-Definition
 	  Multimedia Interface (HDMI), Low-voltage Differential Signalling
diff --git a/drivers/video/stm32/Kconfig b/drivers/video/stm32/Kconfig
index 48066063e4c5..c354c402c288 100644
--- a/drivers/video/stm32/Kconfig
+++ b/drivers/video/stm32/Kconfig
@@ -8,6 +8,7 @@
 menuconfig VIDEO_STM32
 	bool "Enable STM32 video support"
 	depends on VIDEO
+	imply VIDEO_DAMAGE
 	help
 	  STM32 supports many video output options including RGB and
 	  DSI. This option enables these supports which can be used on
diff --git a/drivers/video/tegra20/Kconfig b/drivers/video/tegra20/Kconfig
index f5c4843e1191..2232b0b3ff53 100644
--- a/drivers/video/tegra20/Kconfig
+++ b/drivers/video/tegra20/Kconfig
@@ -1,6 +1,7 @@
 config VIDEO_TEGRA20
 	bool "Enable Display Controller support on Tegra20 and Tegra 30"
 	depends on OF_CONTROL
+	imply VIDEO_DAMAGE
 	help
 	   T20/T30 support video output to an attached LCD panel as well as
 	   other options such as HDMI. Only the LCD is supported in U-Boot.
diff --git a/drivers/video/tidss/Kconfig b/drivers/video/tidss/Kconfig
index 95086f3a5d66..3291b3ceb8d5 100644
--- a/drivers/video/tidss/Kconfig
+++ b/drivers/video/tidss/Kconfig
@@ -11,6 +11,7 @@
 menuconfig VIDEO_TIDSS
 	bool "Enable TIDSS video support"
 	depends on VIDEO
+	imply VIDEO_DAMAGE
 	help
 	  TIDSS supports  video output options LVDS and
 	  DPI . This option enables these supports which can be used on
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 01/13] video: test: Split copy frame buffer check into a function
  2023-08-21 13:50 ` [PATCH v5 01/13] video: test: Split copy frame buffer check into a function Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> While checking frame buffer contents, the video tests also check if the
> copy frame buffer contents match the main frame buffer. To test if only
> the modified regions are updated after a sync, we will need to create
> situations where the two are mismatched. Split this check into another
> function that we can skip calling, since we won't want it to error on
> those mismatched cases.
>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Add patch "video: test: Split copy frame buffer check into a function"
>
>  test/dm/video.c | 69 +++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 58 insertions(+), 11 deletions(-)
>

Reviewed-by: Simon Glass <sjg@chromium.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 02/13] video: test: Support checking copy frame buffer contents
  2023-08-21 13:50 ` [PATCH v5 02/13] video: test: Support checking copy frame buffer contents Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> The video tests have a helper function to generate a pseudo-digest of
> frame buffer contents, but it only does so for the main one. There is
> another check that the copy frame buffer is the same as that. But
> neither is enough to test if only the modified regions are copied to the
> copy frame buffer, since we will want the two to be different in very
> specific ways.
>
> Add a boolean argument to the existing helper function to indicate which
> frame buffer we want to inspect, and update the existing callers.
>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Add patch "video: test: Support checking copy frame buffer contents"
>
>  test/dm/video.c | 76 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 41 insertions(+), 35 deletions(-)
>

Reviewed-by: Simon Glass <sjg@chromium.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 03/13] video: test: Test partial updates of hardware frame buffer
  2023-08-21 13:51 ` [PATCH v5 03/13] video: test: Test partial updates of hardware frame buffer Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> With VIDEO_COPY enabled, only the modified parts of the frame buffer are
> intended to be copied to the hardware. Add a test that checks this, by
> overwriting contents we prepared without telling the video uclass and
> then checking if the overwritten contents have been redrawn on the next
> sync.
>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Add patch "video: test: Test partial updates of hardware frame buffer"
>
>  test/dm/video.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)

Reviewed-by: Simon Glass <sjg@chromium.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 05/13] dm: video: Add damage notification on display fills
  2023-08-21 13:51 ` [PATCH v5 05/13] dm: video: Add damage notification on display fills Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> From: Alexander Graf <agraf@csgraf.de>
>
> Let's report the video damage when we fill parts of the screen. This
> way we can later lazily flush only relevant regions to hardware.
>
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> Reported-by: Da Xue <da@libre.computer>
> [Alper: Call video_damage() in video_fill_part(), edit commit message]
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
> Does video_fill_part() need a video_sync(dev, false) here?
>
> Changes in v5:
> - Call video_damage() also in video_fill_part()
>
>  drivers/video/video-uclass.c | 4 ++++
>  1 file changed, 4 insertions(+)
>

Reviewed-by: Simon Glass <sjg@chromium.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 04/13] dm: video: Add damage tracking API
  2023-08-21 13:51 ` [PATCH v5 04/13] dm: video: Add damage tracking API Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  2023-08-30 19:15     ` Alper Nebi Yasak
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> From: Alexander Graf <agraf@csgraf.de>
>
> We are going to introduce image damage tracking to fasten up screen
> refresh on large displays. This patch adds damage tracking for up to
> one rectangle of the screen which is typically enough to hold blt or
> text print updates. Callers into this API and a reduced dcache flush
> code path will follow in later patches.
>
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> Reported-by: Da Xue <da@libre.computer>
> [Alper: Use xstart/yend, document new fields, return void from
>         video_damage(), declare priv, drop headers, use IS_ENABLED()]
> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Use xstart, ystart, xend, yend as names for damage region
> - Document damage struct and fields in struct video_priv comment
> - Return void from video_damage()
> - Fix undeclared priv error in video_sync()
> - Drop unused headers from video-uclass.c
> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>
> Changes in v4:
> - Move damage clear to patch "dm: video: Add damage tracking API"
> - Simplify first damage logic
> - Remove VIDEO_DAMAGE default for ARM
>
> Changes in v3:
> - Adapt to always assume DM is used
>
> Changes in v2:
> - Remove ifdefs
>
>  drivers/video/Kconfig        | 13 ++++++++++++
>  drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++++---
>  include/video.h              | 32 ++++++++++++++++++++++++++--
>  3 files changed, 81 insertions(+), 5 deletions(-)
>

Reviewed-by: Simon Glass <sjg@chromium.org>

But I suggest an empty static inline in the case where the feature is disabled?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 06/13] vidconsole: Add damage notifications to all vidconsole drivers
  2023-08-21 13:51 ` [PATCH v5 06/13] vidconsole: Add damage notifications to all vidconsole drivers Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> From: Alexander Graf <agraf@csgraf.de>
>
> Now that we have a damage tracking API, let's populate damage done by
> vidconsole drivers. We try to declare as little memory as damaged as
> possible.
>
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> Reported-by: Da Xue <da@libre.computer>
> [Alper: Rebase for met->baseline, fontdata->height/width, make rotated
>         console_putc_xy() damages pass tests, edit patch message]
> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Use met->baseline instead of priv->baseline
> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> - Update console_rotate.c video_damage() calls to pass video tests
> - Remove mention about not having minimal damage for console_rotate.c
>
> Changes in v2:
> - Fix ranges in truetype target
> - Limit rotate to necessary damage
>
>  drivers/video/console_normal.c   | 18 +++++++++++
>  drivers/video/console_rotate.c   | 54 ++++++++++++++++++++++++++++++++
>  drivers/video/console_truetype.c | 21 +++++++++++++
>  drivers/video/video-uclass.c     |  1 +
>  4 files changed, 94 insertions(+)
>

Reviewed-by: Simon Glass <sjg@chromium.org>

Suggest dropping the change to the final file

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY
  2023-08-21 13:51 ` [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  2023-08-21 20:06     ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> From: Alexander Graf <agraf@csgraf.de>
>
> CONFIG_VIDEO_COPY implemented a range-based copying mechanism: If we
> print a single character, it will always copy the full range of bytes
> from the top left corner of the character to the lower right onto the
> uncached frame buffer. This includes pretty much the full line contents
> of the printed character.
>
> Since we now have proper damage tracking, let's make use of that to reduce
> the amount of data we need to copy. With this patch applied, we will only
> copy the tiny rectangle surrounding characters when we print them,
> speeding up the video console.

I suppose for rotated consoles it copies whole lines, but otherwise it
does a lot of small copies?

>
> After this, changes to the main frame buffer are not immediately copied
> to the copy frame buffer, but postponed until the next video device
> sync. So issue an explicit sync before inspecting the copy frame buffer
> contents for the video tests.

So how does the sync get done in this case?

>
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> [Alper: Rebase for fontdata->height/w, fill_part(), fix memmove(dev),
>         drop from defconfig, use damage.xstart/yend, use IS_ENABLED(),
>         call video_sync() before copy_fb check, update video_copy test]
> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> - Fix memmove() calls by removing the extra dev argument
> - Call video_sync() before checking copy_fb in video tests
> - Use xstart, ystart, xend, yend as names for damage region
> - Use met->baseline instead of priv->baseline
> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> - Use xstart, ystart, xend, yend as names for damage region
> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> - Drop VIDEO_DAMAGE from sandbox defconfig added in a new patch
> - Update dm_test_video_copy test added in a new patch
>
> Changes in v3:
> - Make VIDEO_COPY always select VIDEO_DAMAGE
>
> Changes in v2:
> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>
>  configs/sandbox_defconfig         |  1 -
>  drivers/video/Kconfig             |  5 ++
>  drivers/video/console_normal.c    | 13 +----
>  drivers/video/console_rotate.c    | 44 +++-----------
>  drivers/video/console_truetype.c  | 16 +----
>  drivers/video/vidconsole-uclass.c | 16 -----
>  drivers/video/video-uclass.c      | 97 ++++++++-----------------------
>  drivers/video/video_bmp.c         |  7 ---
>  include/video.h                   | 37 ------------
>  include/video_console.h           | 52 -----------------
>  test/dm/video.c                   |  3 +-
>  11 files changed, 43 insertions(+), 248 deletions(-)
>
> diff --git a/configs/sandbox_defconfig b/configs/sandbox_defconfig
> index 51b820f13121..259f31f26cee 100644
> --- a/configs/sandbox_defconfig
> +++ b/configs/sandbox_defconfig
> @@ -307,7 +307,6 @@ CONFIG_USB_ETH_CDC=y
>  CONFIG_VIDEO=y
>  CONFIG_VIDEO_FONT_SUN12X22=y
>  CONFIG_VIDEO_COPY=y
> -CONFIG_VIDEO_DAMAGE=y
>  CONFIG_CONSOLE_ROTATION=y
>  CONFIG_CONSOLE_TRUETYPE=y
>  CONFIG_CONSOLE_TRUETYPE_CANTORAONE=y
> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> index 97f494a1340b..b3fbd9d7d9ca 100644
> --- a/drivers/video/Kconfig
> +++ b/drivers/video/Kconfig
> @@ -83,11 +83,14 @@ config VIDEO_PCI_DEFAULT_FB_SIZE
>
>  config VIDEO_COPY
>         bool "Enable copying the frame buffer to a hardware copy"
> +       select VIDEO_DAMAGE
>         help
>           On some machines (e.g. x86), reading from the frame buffer is very
>           slow because it is uncached. To improve performance, this feature
>           allows the frame buffer to be kept in cached memory (allocated by
>           U-Boot) and then copied to the hardware frame-buffer as needed.
> +         It uses the VIDEO_DAMAGE feature to keep track of regions to copy
> +         and will only copy actually touched regions.
>
>           To use this, your video driver must set @copy_base in
>           struct video_uc_plat.
> @@ -105,6 +108,8 @@ config VIDEO_DAMAGE
>           regions of the frame buffer that were modified before, speeding up
>           screen refreshes significantly.
>
> +         It is also used by VIDEO_COPY to identify which regions changed.
> +
>  config BACKLIGHT_PWM
>         bool "Generic PWM based Backlight Driver"
>         depends on BACKLIGHT && DM_PWM
> diff --git a/drivers/video/console_normal.c b/drivers/video/console_normal.c
> index a19ce6a2bc11..c44aa09473a3 100644
> --- a/drivers/video/console_normal.c
> +++ b/drivers/video/console_normal.c
> @@ -35,10 +35,6 @@ static int console_set_row(struct udevice *dev, uint row, int clr)
>                 fill_pixel_and_goto_next(&dst, clr, pbytes, pbytes);
>         end = dst;
>
> -       ret = vidconsole_sync_copy(dev, line, end);
> -       if (ret)
> -               return ret;
> -
>         video_damage(dev->parent,
>                      0,
>                      fontdata->height * row,
> @@ -57,14 +53,11 @@ static int console_move_rows(struct udevice *dev, uint rowdst,
>         void *dst;
>         void *src;
>         int size;
> -       int ret;
>
>         dst = vid_priv->fb + rowdst * fontdata->height * vid_priv->line_length;
>         src = vid_priv->fb + rowsrc * fontdata->height * vid_priv->line_length;
>         size = fontdata->height * vid_priv->line_length * count;
> -       ret = vidconsole_memmove(dev, dst, src, size);
> -       if (ret)
> -               return ret;
> +       memmove(dst, src, size);

Why are you making that change?

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 13/13] video: Enable VIDEO_DAMAGE for drivers that need it
  2023-08-21 13:51 ` [PATCH v5 13/13] video: Enable VIDEO_DAMAGE for drivers that need it Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> From: Alexander Graf <agraf@csgraf.de>
>
> Some drivers call video_set_flush_dcache() to indicate that they want to
> have the dcache flushed for the frame buffer. These drivers benefit from
> our new video damage control, because we can reduce the amount of memory
> that gets flushed significantly.
>
> This patch enables video damage control for all device drivers that call
> video_set_flush_dcache() to make sure they benefit from it.
>
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> [Alper: Add to VIDEO_TIDSS, imply instead of select]
> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>
> Changes in v4:
> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>
>  arch/arm/mach-omap2/omap3/Kconfig | 1 +
>  arch/arm/mach-sunxi/Kconfig       | 1 +
>  drivers/video/Kconfig             | 8 ++++++++
>  drivers/video/exynos/Kconfig      | 1 +
>  drivers/video/imx/Kconfig         | 1 +
>  drivers/video/meson/Kconfig       | 1 +
>  drivers/video/rockchip/Kconfig    | 1 +
>  drivers/video/stm32/Kconfig       | 1 +
>  drivers/video/tegra20/Kconfig     | 1 +
>  drivers/video/tidss/Kconfig       | 1 +
>  10 files changed, 17 insertions(+)

Reviewed-by: Simon Glass <sjg@chromium.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 07/13] video: test: Test video damage tracking via vidconsole
  2023-08-21 13:51 ` [PATCH v5 07/13] video: test: Test video damage tracking via vidconsole Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> With VIDEO_DAMAGE, the video uclass tracks updated regions of the frame
> buffer in order to avoid unnecessary work during a video sync. Enable
> the config in sandbox and add a test for it, by printing strings at a
> few locations and checking the tracked region.
>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
> This is hard to test because most things issue video syncs that process
> and reset the damaged region.
>
> Changes in v5:
> - Add patch "video: test: Test video damage tracking via vidconsole"
>
>  configs/sandbox_defconfig |  1 +
>  test/dm/video.c           | 56 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)

Reviewed-by: Simon Glass <sjg@chromium.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 09/13] efi_loader: GOP: Add damage notification on BLT
  2023-08-21 13:51 ` [PATCH v5 09/13] efi_loader: GOP: Add damage notification on BLT Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> From: Alexander Graf <agraf@csgraf.de>
>
> Now that we have a damage tracking API, let's populate damage done by
> UEFI payloads when they BLT data onto the screen.
>
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> Reported-by: Da Xue <da@libre.computer>
> Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> [Alper: Add struct comment for new member]
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Document new vdev field in struct efi_gop_obj comment
>
> Changes in v4:
> - Skip damage on EfiBltVideoToBltBuffer
>
> Changes in v3:
> - Adapt to always assume DM is used
>
> Changes in v2:
> - Remove ifdefs from gop
>
>  lib/efi_loader/efi_gop.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>

Reviewed-by: Simon Glass <sjg@chromium.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 10/13] video: Only dcache flush damaged lines
  2023-08-21 13:51 ` [PATCH v5 10/13] video: Only dcache flush damaged lines Alper Nebi Yasak
@ 2023-08-21 19:11   ` Simon Glass
  2023-08-21 19:59     ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> From: Alexander Graf <agraf@csgraf.de>
>
> Now that we have a damage area tells us which parts of the frame buffer
> actually need updating, let's only dcache flush those on video_sync()
> calls. With this optimization in place, frame buffer updates - especially
> on large screen such as 4k displays - speed up significantly.
>
> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> Reported-by: Da Xue <da@libre.computer>
> [Alper: Use damage.xstart/yend, IS_ENABLED()]
> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> ---
>
> Changes in v5:
> - Use xstart, ystart, xend, yend as names for damage region
> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>
> Changes in v2:
> - Fix dcache range; we were flushing too much before
> - Remove ifdefs
>
>  drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)

This is a little strange, since flushing the whole cache will only
actually write out data that was actually written (to the display). Is
there a benefit to this patch, in terms of performance?

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
                   ` (12 preceding siblings ...)
  2023-08-21 13:51 ` [PATCH v5 13/13] video: Enable VIDEO_DAMAGE for drivers that need it Alper Nebi Yasak
@ 2023-08-21 19:11 ` Simon Glass
  2023-08-21 19:33   ` Alexander Graf
  13 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:11 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alper,

On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> This is a rebase of Alexander Graf's video damage tracking series, with
> some tests and other changes. The original cover letter is as follows:
>
> > This patch set speeds up graphics output on ARM by a factor of 60x.
> >
> > On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> > but need it accessible by the display controller which reads directly
> > from a later point of consistency. Hence, we flush the frame buffer to
> > DRAM on every change. The full frame buffer.

It should not, see below.

> >
> > Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> > that can take a while to flush out. This was reported by Da Xue with grub,
> > which happily print 1000s of spaces on the screen to draw a menu. Every
> > printed space triggers a cache flush.

That is a bug somewhere in EFI.

> >
> > This patch set implements the easiest mitigation against this problem:
> > Damage tracking. We remember the lowest common denominator region that was
> > touched since the last video_sync() call and only flush that. The most
> > typical writer to the frame buffer is the video console, which always
> > writes rectangles of characters on the screen and syncs afterwards.
> >
> > With this patch set applied, we reduce drawing a large grub menu (with
> > serial console attached for size information) on an RK3399-ROC system
> > at 1440p from 55 seconds to less than 1 second.
> >
> > Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> > overhead compared to before as well. So even x86 systems should be faster
> > with this now :).
> >
> >
> > Alternatives considered:
> >
> >   1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >      so often. We are missing timers to do this generically.
> >
> >   2) Double buffering - We could try to identify whether anything changed
> >      at all and only draw to the FB if it did. That would require
> >      maintaining a second buffer that we need to scan.
> >
> >   3) Text buffer - Maintain a buffer of all text printed on the screen with
> >      respective location. Don't write if the old and new character are
> >      identical. This would limit applicability to text only and is an
> >      optimization on top of this patch set.
> >
> >   4) Hash screen lines - Create a hash (sha256?) over every line when it
> >      changes. Only flush when it does. I'm not sure if this would waste
> >      more time, memory and cache than the current approach. It would make
> >      full screen updates much more expensive.

5) Fix the bug mentioned above?

>
> Changes in v5:
> - Add patch "video: test: Split copy frame buffer check into a function"
> - Add patch "video: test: Support checking copy frame buffer contents"
> - Add patch "video: test: Test partial updates of hardware frame buffer"
> - Use xstart, ystart, xend, yend as names for damage region
> - Document damage struct and fields in struct video_priv comment
> - Return void from video_damage()
> - Fix undeclared priv error in video_sync()
> - Drop unused headers from video-uclass.c
> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> - Call video_damage() also in video_fill_part()
> - Use met->baseline instead of priv->baseline
> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> - Update console_rotate.c video_damage() calls to pass video tests
> - Remove mention about not having minimal damage for console_rotate.c
> - Add patch "video: test: Test video damage tracking via vidconsole"
> - Document new vdev field in struct efi_gop_obj comment
> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> - Fix memmove() calls by removing the extra dev argument
> - Call video_sync() before checking copy_fb in video tests
> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>
> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>
> Changes in v4:
> - Move damage clear to patch "dm: video: Add damage tracking API"
> - Simplify first damage logic
> - Remove VIDEO_DAMAGE default for ARM
> - Skip damage on EfiBltVideoToBltBuffer
> - Add patch "video: Always compile cache flushing code"
> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>
> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>
> Changes in v3:
> - Adapt to always assume DM is used
> - Adapt to always assume DM is used
> - Make VIDEO_COPY always select VIDEO_DAMAGE
>
> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>
> Changes in v2:
> - Remove ifdefs
> - Fix ranges in truetype target
> - Limit rotate to necessary damage
> - Remove ifdefs from gop
> - Fix dcache range; we were flushing too much before
> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>
> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>
> Alexander Graf (9):
>   dm: video: Add damage tracking API
>   dm: video: Add damage notification on display fills
>   vidconsole: Add damage notifications to all vidconsole drivers
>   video: Add damage notification on bmp display
>   efi_loader: GOP: Add damage notification on BLT
>   video: Only dcache flush damaged lines
>   video: Use VIDEO_DAMAGE for VIDEO_COPY
>   video: Always compile cache flushing code
>   video: Enable VIDEO_DAMAGE for drivers that need it
>
> Alper Nebi Yasak (4):
>   video: test: Split copy frame buffer check into a function
>   video: test: Support checking copy frame buffer contents
>   video: test: Test partial updates of hardware frame buffer
>   video: test: Test video damage tracking via vidconsole
>
>  arch/arm/mach-omap2/omap3/Kconfig |   1 +
>  arch/arm/mach-sunxi/Kconfig       |   1 +
>  drivers/video/Kconfig             |  26 +++
>  drivers/video/console_normal.c    |  27 ++--
>  drivers/video/console_rotate.c    |  94 +++++++----
>  drivers/video/console_truetype.c  |  37 +++--
>  drivers/video/exynos/Kconfig      |   1 +
>  drivers/video/imx/Kconfig         |   1 +
>  drivers/video/meson/Kconfig       |   1 +
>  drivers/video/rockchip/Kconfig    |   1 +
>  drivers/video/stm32/Kconfig       |   1 +
>  drivers/video/tegra20/Kconfig     |   1 +
>  drivers/video/tidss/Kconfig       |   1 +
>  drivers/video/vidconsole-uclass.c |  16 --
>  drivers/video/video-uclass.c      | 190 ++++++++++++----------
>  drivers/video/video_bmp.c         |   7 +-
>  include/video.h                   |  59 +++----
>  include/video_console.h           |  52 ------
>  lib/efi_loader/efi_gop.c          |   7 +
>  test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>  20 files changed, 483 insertions(+), 297 deletions(-)

It is good to see this tidied up into something that can be applied!

I am unsure what is going on with the EFI performance, though. It
should not flush the cache after every character, only after a new
line. Is there something wrong in here? If so, we should fix that bug
first and it should be patch 1 of this series.

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-21 19:11 ` [PATCH v5 00/13] Add video damage tracking Simon Glass
@ 2023-08-21 19:33   ` Alexander Graf
  2023-08-21 19:57     ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-21 19:33 UTC (permalink / raw)
  To: Simon Glass, Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Philipp Tomsich, Andrew Davis, Da Xue, Heinrich Schuchardt,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong


On 21.08.23 21:11, Simon Glass wrote:
> Hi Alper,
>
> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>> This is a rebase of Alexander Graf's video damage tracking series, with
>> some tests and other changes. The original cover letter is as follows:
>>
>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>
>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>> but need it accessible by the display controller which reads directly
>>> from a later point of consistency. Hence, we flush the frame buffer to
>>> DRAM on every change. The full frame buffer.
> It should not, see below.
>
>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>> printed space triggers a cache flush.
> That is a bug somewhere in EFI.


Unfortunately not :). You may call it a bug in grub: It literally prints 
over space characters for every character in its menu that it wants 
cleared. On every text screen draw.

This wouldn't be a big issue if we only flush the reactangle that gets 
modified. But without this patch set, we're flushing the full DRAM 
buffer on every u-boot text console character write, which means for 
every character (as that's the only API UEFI has).

As a nice side effect, we speed up the normal U-Boot text console as 
well with this patch set, because even "normal" text prints that write 
for example a single line of text on the screen today flush the full 
frame buffer to DRAM.


>
>>> This patch set implements the easiest mitigation against this problem:
>>> Damage tracking. We remember the lowest common denominator region that was
>>> touched since the last video_sync() call and only flush that. The most
>>> typical writer to the frame buffer is the video console, which always
>>> writes rectangles of characters on the screen and syncs afterwards.
>>>
>>> With this patch set applied, we reduce drawing a large grub menu (with
>>> serial console attached for size information) on an RK3399-ROC system
>>> at 1440p from 55 seconds to less than 1 second.
>>>
>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>> overhead compared to before as well. So even x86 systems should be faster
>>> with this now :).
>>>
>>>
>>> Alternatives considered:
>>>
>>>    1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>       so often. We are missing timers to do this generically.
>>>
>>>    2) Double buffering - We could try to identify whether anything changed
>>>       at all and only draw to the FB if it did. That would require
>>>       maintaining a second buffer that we need to scan.
>>>
>>>    3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>       respective location. Don't write if the old and new character are
>>>       identical. This would limit applicability to text only and is an
>>>       optimization on top of this patch set.
>>>
>>>    4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>       changes. Only flush when it does. I'm not sure if this would waste
>>>       more time, memory and cache than the current approach. It would make
>>>       full screen updates much more expensive.
> 5) Fix the bug mentioned above?
>
>> Changes in v5:
>> - Add patch "video: test: Split copy frame buffer check into a function"
>> - Add patch "video: test: Support checking copy frame buffer contents"
>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>> - Use xstart, ystart, xend, yend as names for damage region
>> - Document damage struct and fields in struct video_priv comment
>> - Return void from video_damage()
>> - Fix undeclared priv error in video_sync()
>> - Drop unused headers from video-uclass.c
>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>> - Call video_damage() also in video_fill_part()
>> - Use met->baseline instead of priv->baseline
>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>> - Update console_rotate.c video_damage() calls to pass video tests
>> - Remove mention about not having minimal damage for console_rotate.c
>> - Add patch "video: test: Test video damage tracking via vidconsole"
>> - Document new vdev field in struct efi_gop_obj comment
>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>> - Fix memmove() calls by removing the extra dev argument
>> - Call video_sync() before checking copy_fb in video tests
>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>
>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>
>> Changes in v4:
>> - Move damage clear to patch "dm: video: Add damage tracking API"
>> - Simplify first damage logic
>> - Remove VIDEO_DAMAGE default for ARM
>> - Skip damage on EfiBltVideoToBltBuffer
>> - Add patch "video: Always compile cache flushing code"
>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>
>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>
>> Changes in v3:
>> - Adapt to always assume DM is used
>> - Adapt to always assume DM is used
>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>
>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>
>> Changes in v2:
>> - Remove ifdefs
>> - Fix ranges in truetype target
>> - Limit rotate to necessary damage
>> - Remove ifdefs from gop
>> - Fix dcache range; we were flushing too much before
>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>
>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>
>> Alexander Graf (9):
>>    dm: video: Add damage tracking API
>>    dm: video: Add damage notification on display fills
>>    vidconsole: Add damage notifications to all vidconsole drivers
>>    video: Add damage notification on bmp display
>>    efi_loader: GOP: Add damage notification on BLT
>>    video: Only dcache flush damaged lines
>>    video: Use VIDEO_DAMAGE for VIDEO_COPY
>>    video: Always compile cache flushing code
>>    video: Enable VIDEO_DAMAGE for drivers that need it
>>
>> Alper Nebi Yasak (4):
>>    video: test: Split copy frame buffer check into a function
>>    video: test: Support checking copy frame buffer contents
>>    video: test: Test partial updates of hardware frame buffer
>>    video: test: Test video damage tracking via vidconsole
>>
>>   arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>   arch/arm/mach-sunxi/Kconfig       |   1 +
>>   drivers/video/Kconfig             |  26 +++
>>   drivers/video/console_normal.c    |  27 ++--
>>   drivers/video/console_rotate.c    |  94 +++++++----
>>   drivers/video/console_truetype.c  |  37 +++--
>>   drivers/video/exynos/Kconfig      |   1 +
>>   drivers/video/imx/Kconfig         |   1 +
>>   drivers/video/meson/Kconfig       |   1 +
>>   drivers/video/rockchip/Kconfig    |   1 +
>>   drivers/video/stm32/Kconfig       |   1 +
>>   drivers/video/tegra20/Kconfig     |   1 +
>>   drivers/video/tidss/Kconfig       |   1 +
>>   drivers/video/vidconsole-uclass.c |  16 --
>>   drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>   drivers/video/video_bmp.c         |   7 +-
>>   include/video.h                   |  59 +++----
>>   include/video_console.h           |  52 ------
>>   lib/efi_loader/efi_gop.c          |   7 +
>>   test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>   20 files changed, 483 insertions(+), 297 deletions(-)
> It is good to see this tidied up into something that can be applied!
>
> I am unsure what is going on with the EFI performance, though. It
> should not flush the cache after every character, only after a new
> line. Is there something wrong in here? If so, we should fix that bug
> first and it should be patch 1 of this series.


Before I came up with this series, I was trying to identify the UEFI bug 
in question as well, because intuition told me surely this is a bug in 
UEFI :). Turns out it really isn't this time around.


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-21 19:33   ` Alexander Graf
@ 2023-08-21 19:57     ` Simon Glass
  2023-08-21 20:20       ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 19:57 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alex,

On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 21.08.23 21:11, Simon Glass wrote:
> > Hi Alper,
> >
> > On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >> This is a rebase of Alexander Graf's video damage tracking series, with
> >> some tests and other changes. The original cover letter is as follows:
> >>
> >>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>
> >>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >>> but need it accessible by the display controller which reads directly
> >>> from a later point of consistency. Hence, we flush the frame buffer to
> >>> DRAM on every change. The full frame buffer.
> > It should not, see below.
> >
> >>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> >>> that can take a while to flush out. This was reported by Da Xue with grub,
> >>> which happily print 1000s of spaces on the screen to draw a menu. Every
> >>> printed space triggers a cache flush.
> > That is a bug somewhere in EFI.
>
>
> Unfortunately not :). You may call it a bug in grub: It literally prints
> over space characters for every character in its menu that it wants
> cleared. On every text screen draw.
>
> This wouldn't be a big issue if we only flush the reactangle that gets
> modified. But without this patch set, we're flushing the full DRAM
> buffer on every u-boot text console character write, which means for
> every character (as that's the only API UEFI has).
>
> As a nice side effect, we speed up the normal U-Boot text console as
> well with this patch set, because even "normal" text prints that write
> for example a single line of text on the screen today flush the full
> frame buffer to DRAM.

No, I mean that it is a bug that U-Boot (apparently) flushes the cache
after every character. It doesn't do that for normal character output
and I don't think it makes sense to do it for EFI either.

>
>
> >
> >>> This patch set implements the easiest mitigation against this problem:
> >>> Damage tracking. We remember the lowest common denominator region that was
> >>> touched since the last video_sync() call and only flush that. The most
> >>> typical writer to the frame buffer is the video console, which always
> >>> writes rectangles of characters on the screen and syncs afterwards.
> >>>
> >>> With this patch set applied, we reduce drawing a large grub menu (with
> >>> serial console attached for size information) on an RK3399-ROC system
> >>> at 1440p from 55 seconds to less than 1 second.
> >>>
> >>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> >>> overhead compared to before as well. So even x86 systems should be faster
> >>> with this now :).
> >>>
> >>>
> >>> Alternatives considered:
> >>>
> >>>    1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >>>       so often. We are missing timers to do this generically.
> >>>
> >>>    2) Double buffering - We could try to identify whether anything changed
> >>>       at all and only draw to the FB if it did. That would require
> >>>       maintaining a second buffer that we need to scan.
> >>>
> >>>    3) Text buffer - Maintain a buffer of all text printed on the screen with
> >>>       respective location. Don't write if the old and new character are
> >>>       identical. This would limit applicability to text only and is an
> >>>       optimization on top of this patch set.
> >>>
> >>>    4) Hash screen lines - Create a hash (sha256?) over every line when it
> >>>       changes. Only flush when it does. I'm not sure if this would waste
> >>>       more time, memory and cache than the current approach. It would make
> >>>       full screen updates much more expensive.
> > 5) Fix the bug mentioned above?
> >
> >> Changes in v5:
> >> - Add patch "video: test: Split copy frame buffer check into a function"
> >> - Add patch "video: test: Support checking copy frame buffer contents"
> >> - Add patch "video: test: Test partial updates of hardware frame buffer"
> >> - Use xstart, ystart, xend, yend as names for damage region
> >> - Document damage struct and fields in struct video_priv comment
> >> - Return void from video_damage()
> >> - Fix undeclared priv error in video_sync()
> >> - Drop unused headers from video-uclass.c
> >> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >> - Call video_damage() also in video_fill_part()
> >> - Use met->baseline instead of priv->baseline
> >> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >> - Update console_rotate.c video_damage() calls to pass video tests
> >> - Remove mention about not having minimal damage for console_rotate.c
> >> - Add patch "video: test: Test video damage tracking via vidconsole"
> >> - Document new vdev field in struct efi_gop_obj comment
> >> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >> - Fix memmove() calls by removing the extra dev argument
> >> - Call video_sync() before checking copy_fb in video tests
> >> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> >> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
> >>
> >> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
> >>
> >> Changes in v4:
> >> - Move damage clear to patch "dm: video: Add damage tracking API"
> >> - Simplify first damage logic
> >> - Remove VIDEO_DAMAGE default for ARM
> >> - Skip damage on EfiBltVideoToBltBuffer
> >> - Add patch "video: Always compile cache flushing code"
> >> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
> >>
> >> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
> >>
> >> Changes in v3:
> >> - Adapt to always assume DM is used
> >> - Adapt to always assume DM is used
> >> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>
> >> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
> >>
> >> Changes in v2:
> >> - Remove ifdefs
> >> - Fix ranges in truetype target
> >> - Limit rotate to necessary damage
> >> - Remove ifdefs from gop
> >> - Fix dcache range; we were flushing too much before
> >> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>
> >> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
> >>
> >> Alexander Graf (9):
> >>    dm: video: Add damage tracking API
> >>    dm: video: Add damage notification on display fills
> >>    vidconsole: Add damage notifications to all vidconsole drivers
> >>    video: Add damage notification on bmp display
> >>    efi_loader: GOP: Add damage notification on BLT
> >>    video: Only dcache flush damaged lines
> >>    video: Use VIDEO_DAMAGE for VIDEO_COPY
> >>    video: Always compile cache flushing code
> >>    video: Enable VIDEO_DAMAGE for drivers that need it
> >>
> >> Alper Nebi Yasak (4):
> >>    video: test: Split copy frame buffer check into a function
> >>    video: test: Support checking copy frame buffer contents
> >>    video: test: Test partial updates of hardware frame buffer
> >>    video: test: Test video damage tracking via vidconsole
> >>
> >>   arch/arm/mach-omap2/omap3/Kconfig |   1 +
> >>   arch/arm/mach-sunxi/Kconfig       |   1 +
> >>   drivers/video/Kconfig             |  26 +++
> >>   drivers/video/console_normal.c    |  27 ++--
> >>   drivers/video/console_rotate.c    |  94 +++++++----
> >>   drivers/video/console_truetype.c  |  37 +++--
> >>   drivers/video/exynos/Kconfig      |   1 +
> >>   drivers/video/imx/Kconfig         |   1 +
> >>   drivers/video/meson/Kconfig       |   1 +
> >>   drivers/video/rockchip/Kconfig    |   1 +
> >>   drivers/video/stm32/Kconfig       |   1 +
> >>   drivers/video/tegra20/Kconfig     |   1 +
> >>   drivers/video/tidss/Kconfig       |   1 +
> >>   drivers/video/vidconsole-uclass.c |  16 --
> >>   drivers/video/video-uclass.c      | 190 ++++++++++++----------
> >>   drivers/video/video_bmp.c         |   7 +-
> >>   include/video.h                   |  59 +++----
> >>   include/video_console.h           |  52 ------
> >>   lib/efi_loader/efi_gop.c          |   7 +
> >>   test/dm/video.c                   | 256 ++++++++++++++++++++++++------
> >>   20 files changed, 483 insertions(+), 297 deletions(-)
> > It is good to see this tidied up into something that can be applied!
> >
> > I am unsure what is going on with the EFI performance, though. It
> > should not flush the cache after every character, only after a new
> > line. Is there something wrong in here? If so, we should fix that bug
> > first and it should be patch 1 of this series.
>
>
> Before I came up with this series, I was trying to identify the UEFI bug
> in question as well, because intuition told me surely this is a bug in
> UEFI :). Turns out it really isn't this time around.

I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
implementation. Where did you look for the bug?

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 10/13] video: Only dcache flush damaged lines
  2023-08-21 19:11   ` Simon Glass
@ 2023-08-21 19:59     ` Alexander Graf
  2023-08-21 22:10       ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-21 19:59 UTC (permalink / raw)
  To: Simon Glass, Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Philipp Tomsich, Andrew Davis, Da Xue, Heinrich Schuchardt,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong


On 21.08.23 21:11, Simon Glass wrote:
> Hi Alper,
>
> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>> From: Alexander Graf <agraf@csgraf.de>
>>
>> Now that we have a damage area tells us which parts of the frame buffer
>> actually need updating, let's only dcache flush those on video_sync()
>> calls. With this optimization in place, frame buffer updates - especially
>> on large screen such as 4k displays - speed up significantly.
>>
>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>> Reported-by: Da Xue <da@libre.computer>
>> [Alper: Use damage.xstart/yend, IS_ENABLED()]
>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>> ---
>>
>> Changes in v5:
>> - Use xstart, ystart, xend, yend as names for damage region
>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>
>> Changes in v2:
>> - Fix dcache range; we were flushing too much before
>> - Remove ifdefs
>>
>>   drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
>>   1 file changed, 36 insertions(+), 5 deletions(-)
> This is a little strange, since flushing the whole cache will only
> actually write out data that was actually written (to the display). Is
> there a benefit to this patch, in terms of performance?


I'm happy to see you go through the same thought process I went through 
when writing these: "This surely can't be the problem, can it?". The 
answer is "simple" in hindsight:

Have a look at the ARMv8 cache flush function. It does the only "safe" 
thing you can expect it to do: Clean+Invalidate to POC because we use it 
for multiple things, clearing modified code among others:

ENTRY(__asm_flush_dcache_range)
         mrs     x3, ctr_el0
         ubfx    x3, x3, #16, #4
         mov     x2, #4
         lsl     x2, x2, x3              /* cache line size */

         /* x2 <- minimal cache line size in cache system */
         sub     x3, x2, #1
         bic     x0, x0, x3
1:      dc      civac, x0       /* clean & invalidate data or unified 
cache */
         add     x0, x0, x2
         cmp     x0, x1
         b.lo    1b
         dsb     sy
         ret
ENDPROC(__asm_flush_dcache_range)


Looking at the "dc civac" call, we find this documentation page from 
ARM: 
https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC

This says we're writing any dirtyness of this cache line up to the POC 
and then invalidate (remove the cache line) also up to POC. That means 
when you look at a typical SBC, this will either be L2 or system level 
cache. Every read afterwards needs to go and pull it all the way back to 
L1 to modify it (or not) on the next character write and then flush it 
again.

Even worse: Because of the invalidate, we may even evict it from caches 
that the display controller uses to read the frame buffer. So depending 
on the SoC's cache topology and implementation, we may force the display 
controller to refetch the full FB content on its next screen refresh cycle.

I faintly remember that I tried to experiment with CVAC instead to only 
flush without invalidating. I don't fully remember the results anymore 
though. I believe CVAC just behaved identical to CIVAC on the A53 
platform I was working on. And then I looked at Cortex-A53 errata like 
[1] and just accepted that doing anything but restricting the flushing 
range is a waste of time :)


Alex


[1] 
https://patchwork.kernel.org/project/xen-devel/patch/1462466065-30212-14-git-send-email-julien.grall@arm.com/



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY
  2023-08-21 19:11   ` Simon Glass
@ 2023-08-21 20:06     ` Alexander Graf
  2023-08-30 19:07       ` Alper Nebi Yasak
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-21 20:06 UTC (permalink / raw)
  To: Simon Glass, Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Philipp Tomsich, Andrew Davis, Da Xue, Heinrich Schuchardt,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong


On 21.08.23 21:11, Simon Glass wrote:
> Hi Alper,
>
> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>> From: Alexander Graf <agraf@csgraf.de>
>>
>> CONFIG_VIDEO_COPY implemented a range-based copying mechanism: If we
>> print a single character, it will always copy the full range of bytes
>> from the top left corner of the character to the lower right onto the
>> uncached frame buffer. This includes pretty much the full line contents
>> of the printed character.
>>
>> Since we now have proper damage tracking, let's make use of that to reduce
>> the amount of data we need to copy. With this patch applied, we will only
>> copy the tiny rectangle surrounding characters when we print them,
>> speeding up the video console.
> I suppose for rotated consoles it copies whole lines, but otherwise it
> does a lot of small copies?


I tried to keep the code as simple as possible and only track an "upper 
left" and "lower right" corner of modifications. So sync will always 
copy/flush a single rectangle.


>
>> After this, changes to the main frame buffer are not immediately copied
>> to the copy frame buffer, but postponed until the next video device
>> sync. So issue an explicit sync before inspecting the copy frame buffer
>> contents for the video tests.
> So how does the sync get done in this case?


It gets called as part of video_sync():

+static void video_flush_copy(struct udevice *vid)
+{
+	struct video_priv *priv = dev_get_uclass_priv(vid);
+
+	if (!priv->copy_fb)
+		return;
+
+	if (priv->damage.xend && priv->damage.yend) {
+		int lstart = priv->damage.xstart * VNBYTES(priv->bpix);
+		int lend = priv->damage.xend * VNBYTES(priv->bpix);
+		int y;
+
+		for (y = priv->damage.ystart; y < priv->damage.yend; y++) {
+			ulong offset = (y * priv->line_length) + lstart;
+			ulong len = lend - lstart;
+
+			memcpy(priv->copy_fb + offset, priv->fb + offset, len);
+		}
+	}
+}


>
>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>> [Alper: Rebase for fontdata->height/w, fill_part(), fix memmove(dev),
>>          drop from defconfig, use damage.xstart/yend, use IS_ENABLED(),
>>          call video_sync() before copy_fb check, update video_copy test]
>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>> ---
>>
>> Changes in v5:
>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>> - Fix memmove() calls by removing the extra dev argument
>> - Call video_sync() before checking copy_fb in video tests
>> - Use xstart, ystart, xend, yend as names for damage region
>> - Use met->baseline instead of priv->baseline
>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>> - Use xstart, ystart, xend, yend as names for damage region
>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>> - Drop VIDEO_DAMAGE from sandbox defconfig added in a new patch
>> - Update dm_test_video_copy test added in a new patch
>>
>> Changes in v3:
>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>
>> Changes in v2:
>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>
>>   configs/sandbox_defconfig         |  1 -
>>   drivers/video/Kconfig             |  5 ++
>>   drivers/video/console_normal.c    | 13 +----
>>   drivers/video/console_rotate.c    | 44 +++-----------
>>   drivers/video/console_truetype.c  | 16 +----
>>   drivers/video/vidconsole-uclass.c | 16 -----
>>   drivers/video/video-uclass.c      | 97 ++++++++-----------------------
>>   drivers/video/video_bmp.c         |  7 ---
>>   include/video.h                   | 37 ------------
>>   include/video_console.h           | 52 -----------------
>>   test/dm/video.c                   |  3 +-
>>   11 files changed, 43 insertions(+), 248 deletions(-)
>>
>> diff --git a/configs/sandbox_defconfig b/configs/sandbox_defconfig
>> index 51b820f13121..259f31f26cee 100644
>> --- a/configs/sandbox_defconfig
>> +++ b/configs/sandbox_defconfig
>> @@ -307,7 +307,6 @@ CONFIG_USB_ETH_CDC=y
>>   CONFIG_VIDEO=y
>>   CONFIG_VIDEO_FONT_SUN12X22=y
>>   CONFIG_VIDEO_COPY=y
>> -CONFIG_VIDEO_DAMAGE=y
>>   CONFIG_CONSOLE_ROTATION=y
>>   CONFIG_CONSOLE_TRUETYPE=y
>>   CONFIG_CONSOLE_TRUETYPE_CANTORAONE=y
>> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
>> index 97f494a1340b..b3fbd9d7d9ca 100644
>> --- a/drivers/video/Kconfig
>> +++ b/drivers/video/Kconfig
>> @@ -83,11 +83,14 @@ config VIDEO_PCI_DEFAULT_FB_SIZE
>>
>>   config VIDEO_COPY
>>          bool "Enable copying the frame buffer to a hardware copy"
>> +       select VIDEO_DAMAGE
>>          help
>>            On some machines (e.g. x86), reading from the frame buffer is very
>>            slow because it is uncached. To improve performance, this feature
>>            allows the frame buffer to be kept in cached memory (allocated by
>>            U-Boot) and then copied to the hardware frame-buffer as needed.
>> +         It uses the VIDEO_DAMAGE feature to keep track of regions to copy
>> +         and will only copy actually touched regions.
>>
>>            To use this, your video driver must set @copy_base in
>>            struct video_uc_plat.
>> @@ -105,6 +108,8 @@ config VIDEO_DAMAGE
>>            regions of the frame buffer that were modified before, speeding up
>>            screen refreshes significantly.
>>
>> +         It is also used by VIDEO_COPY to identify which regions changed.
>> +
>>   config BACKLIGHT_PWM
>>          bool "Generic PWM based Backlight Driver"
>>          depends on BACKLIGHT && DM_PWM
>> diff --git a/drivers/video/console_normal.c b/drivers/video/console_normal.c
>> index a19ce6a2bc11..c44aa09473a3 100644
>> --- a/drivers/video/console_normal.c
>> +++ b/drivers/video/console_normal.c
>> @@ -35,10 +35,6 @@ static int console_set_row(struct udevice *dev, uint row, int clr)
>>                  fill_pixel_and_goto_next(&dst, clr, pbytes, pbytes);
>>          end = dst;
>>
>> -       ret = vidconsole_sync_copy(dev, line, end);
>> -       if (ret)
>> -               return ret;
>> -
>>          video_damage(dev->parent,
>>                       0,
>>                       fontdata->height * row,
>> @@ -57,14 +53,11 @@ static int console_move_rows(struct udevice *dev, uint rowdst,
>>          void *dst;
>>          void *src;
>>          int size;
>> -       int ret;
>>
>>          dst = vid_priv->fb + rowdst * fontdata->height * vid_priv->line_length;
>>          src = vid_priv->fb + rowsrc * fontdata->height * vid_priv->line_length;
>>          size = fontdata->height * vid_priv->line_length * count;
>> -       ret = vidconsole_memmove(dev, dst, src, size);
>> -       if (ret)
>> -               return ret;
>> +       memmove(dst, src, size);
> Why are you making that change?


There is no point in keeping a special vidconsole_memmove() around 
anymore, since we don't actually need to call vidconsole_sync_copy() 
after the move. The damage call that we introduced to all call sites in 
combination with a video_sync() call takes over the job of the sync copy.


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-21 19:57     ` Simon Glass
@ 2023-08-21 20:20       ` Alexander Graf
  2023-08-21 22:10         ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-21 20:20 UTC (permalink / raw)
  To: Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong


On 21.08.23 21:57, Simon Glass wrote:
> Hi Alex,
>
> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
>>
>> On 21.08.23 21:11, Simon Glass wrote:
>>> Hi Alper,
>>>
>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>> This is a rebase of Alexander Graf's video damage tracking series, with
>>>> some tests and other changes. The original cover letter is as follows:
>>>>
>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>>
>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>>> but need it accessible by the display controller which reads directly
>>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>>> DRAM on every change. The full frame buffer.
>>> It should not, see below.
>>>
>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>>> printed space triggers a cache flush.
>>> That is a bug somewhere in EFI.
>>
>> Unfortunately not :). You may call it a bug in grub: It literally prints
>> over space characters for every character in its menu that it wants
>> cleared. On every text screen draw.
>>
>> This wouldn't be a big issue if we only flush the reactangle that gets
>> modified. But without this patch set, we're flushing the full DRAM
>> buffer on every u-boot text console character write, which means for
>> every character (as that's the only API UEFI has).
>>
>> As a nice side effect, we speed up the normal U-Boot text console as
>> well with this patch set, because even "normal" text prints that write
>> for example a single line of text on the screen today flush the full
>> frame buffer to DRAM.
> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> after every character. It doesn't do that for normal character output
> and I don't think it makes sense to do it for EFI either.


I see. Let's trace the calls:

efi_cout_output_string()
-> fputs()
-> vidconsole_puts()
-> video_sync()
-> flush_dcache_range()

Unfortunately grub abstracts character backends down to the "print 
character" level, so it calls UEFI's sopisticated "output_string" 
callback with single characters at a time, which means we do a full 
dcache flush for every character that we print:

https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165


>
>>
>>>>> This patch set implements the easiest mitigation against this problem:
>>>>> Damage tracking. We remember the lowest common denominator region that was
>>>>> touched since the last video_sync() call and only flush that. The most
>>>>> typical writer to the frame buffer is the video console, which always
>>>>> writes rectangles of characters on the screen and syncs afterwards.
>>>>>
>>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>>> serial console attached for size information) on an RK3399-ROC system
>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>
>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>>>> overhead compared to before as well. So even x86 systems should be faster
>>>>> with this now :).
>>>>>
>>>>>
>>>>> Alternatives considered:
>>>>>
>>>>>     1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>>        so often. We are missing timers to do this generically.
>>>>>
>>>>>     2) Double buffering - We could try to identify whether anything changed
>>>>>        at all and only draw to the FB if it did. That would require
>>>>>        maintaining a second buffer that we need to scan.
>>>>>
>>>>>     3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>>        respective location. Don't write if the old and new character are
>>>>>        identical. This would limit applicability to text only and is an
>>>>>        optimization on top of this patch set.
>>>>>
>>>>>     4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>>        changes. Only flush when it does. I'm not sure if this would waste
>>>>>        more time, memory and cache than the current approach. It would make
>>>>>        full screen updates much more expensive.
>>> 5) Fix the bug mentioned above?
>>>
>>>> Changes in v5:
>>>> - Add patch "video: test: Split copy frame buffer check into a function"
>>>> - Add patch "video: test: Support checking copy frame buffer contents"
>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>> - Document damage struct and fields in struct video_priv comment
>>>> - Return void from video_damage()
>>>> - Fix undeclared priv error in video_sync()
>>>> - Drop unused headers from video-uclass.c
>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>> - Call video_damage() also in video_fill_part()
>>>> - Use met->baseline instead of priv->baseline
>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>>> - Update console_rotate.c video_damage() calls to pass video tests
>>>> - Remove mention about not having minimal damage for console_rotate.c
>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
>>>> - Document new vdev field in struct efi_gop_obj comment
>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>>> - Fix memmove() calls by removing the extra dev argument
>>>> - Call video_sync() before checking copy_fb in video tests
>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>
>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>
>>>> Changes in v4:
>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
>>>> - Simplify first damage logic
>>>> - Remove VIDEO_DAMAGE default for ARM
>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>> - Add patch "video: Always compile cache flushing code"
>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>>>
>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>
>>>> Changes in v3:
>>>> - Adapt to always assume DM is used
>>>> - Adapt to always assume DM is used
>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>
>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>
>>>> Changes in v2:
>>>> - Remove ifdefs
>>>> - Fix ranges in truetype target
>>>> - Limit rotate to necessary damage
>>>> - Remove ifdefs from gop
>>>> - Fix dcache range; we were flushing too much before
>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>
>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>
>>>> Alexander Graf (9):
>>>>     dm: video: Add damage tracking API
>>>>     dm: video: Add damage notification on display fills
>>>>     vidconsole: Add damage notifications to all vidconsole drivers
>>>>     video: Add damage notification on bmp display
>>>>     efi_loader: GOP: Add damage notification on BLT
>>>>     video: Only dcache flush damaged lines
>>>>     video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>     video: Always compile cache flushing code
>>>>     video: Enable VIDEO_DAMAGE for drivers that need it
>>>>
>>>> Alper Nebi Yasak (4):
>>>>     video: test: Split copy frame buffer check into a function
>>>>     video: test: Support checking copy frame buffer contents
>>>>     video: test: Test partial updates of hardware frame buffer
>>>>     video: test: Test video damage tracking via vidconsole
>>>>
>>>>    arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>    arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>    drivers/video/Kconfig             |  26 +++
>>>>    drivers/video/console_normal.c    |  27 ++--
>>>>    drivers/video/console_rotate.c    |  94 +++++++----
>>>>    drivers/video/console_truetype.c  |  37 +++--
>>>>    drivers/video/exynos/Kconfig      |   1 +
>>>>    drivers/video/imx/Kconfig         |   1 +
>>>>    drivers/video/meson/Kconfig       |   1 +
>>>>    drivers/video/rockchip/Kconfig    |   1 +
>>>>    drivers/video/stm32/Kconfig       |   1 +
>>>>    drivers/video/tegra20/Kconfig     |   1 +
>>>>    drivers/video/tidss/Kconfig       |   1 +
>>>>    drivers/video/vidconsole-uclass.c |  16 --
>>>>    drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>>>    drivers/video/video_bmp.c         |   7 +-
>>>>    include/video.h                   |  59 +++----
>>>>    include/video_console.h           |  52 ------
>>>>    lib/efi_loader/efi_gop.c          |   7 +
>>>>    test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>>>    20 files changed, 483 insertions(+), 297 deletions(-)
>>> It is good to see this tidied up into something that can be applied!
>>>
>>> I am unsure what is going on with the EFI performance, though. It
>>> should not flush the cache after every character, only after a new
>>> line. Is there something wrong in here? If so, we should fix that bug
>>> first and it should be patch 1 of this series.
>>
>> Before I came up with this series, I was trying to identify the UEFI bug
>> in question as well, because intuition told me surely this is a bug in
>> UEFI :). Turns out it really isn't this time around.
> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> implementation. Where did you look for the bug?


The "real" bug is in grub. But given that it's reasonably simple to work 
around in U-Boot and even with it "fixed" in grub we would still see 
performance benefits from flushing only parts of the screen, I think 
it's worth living with the grub deficiency.


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-21 20:20       ` Alexander Graf
@ 2023-08-21 22:10         ` Simon Glass
  2023-08-21 22:40           ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 22:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alex,

On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 21.08.23 21:57, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
> >>
> >> On 21.08.23 21:11, Simon Glass wrote:
> >>> Hi Alper,
> >>>
> >>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >>>> This is a rebase of Alexander Graf's video damage tracking series, with
> >>>> some tests and other changes. The original cover letter is as follows:
> >>>>
> >>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>>>
> >>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >>>>> but need it accessible by the display controller which reads directly
> >>>>> from a later point of consistency. Hence, we flush the frame buffer to
> >>>>> DRAM on every change. The full frame buffer.
> >>> It should not, see below.
> >>>
> >>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> >>>>> that can take a while to flush out. This was reported by Da Xue with grub,
> >>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
> >>>>> printed space triggers a cache flush.
> >>> That is a bug somewhere in EFI.
> >>
> >> Unfortunately not :). You may call it a bug in grub: It literally prints
> >> over space characters for every character in its menu that it wants
> >> cleared. On every text screen draw.
> >>
> >> This wouldn't be a big issue if we only flush the reactangle that gets
> >> modified. But without this patch set, we're flushing the full DRAM
> >> buffer on every u-boot text console character write, which means for
> >> every character (as that's the only API UEFI has).
> >>
> >> As a nice side effect, we speed up the normal U-Boot text console as
> >> well with this patch set, because even "normal" text prints that write
> >> for example a single line of text on the screen today flush the full
> >> frame buffer to DRAM.
> > No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> > after every character. It doesn't do that for normal character output
> > and I don't think it makes sense to do it for EFI either.
>
>
> I see. Let's trace the calls:
>
> efi_cout_output_string()
> -> fputs()
> -> vidconsole_puts()
> -> video_sync()
> -> flush_dcache_range()
>
> Unfortunately grub abstracts character backends down to the "print
> character" level, so it calls UEFI's sopisticated "output_string"
> callback with single characters at a time, which means we do a full
> dcache flush for every character that we print:
>
> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>
>
> >
> >>
> >>>>> This patch set implements the easiest mitigation against this problem:
> >>>>> Damage tracking. We remember the lowest common denominator region that was
> >>>>> touched since the last video_sync() call and only flush that. The most
> >>>>> typical writer to the frame buffer is the video console, which always
> >>>>> writes rectangles of characters on the screen and syncs afterwards.
> >>>>>
> >>>>> With this patch set applied, we reduce drawing a large grub menu (with
> >>>>> serial console attached for size information) on an RK3399-ROC system
> >>>>> at 1440p from 55 seconds to less than 1 second.
> >>>>>
> >>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> >>>>> overhead compared to before as well. So even x86 systems should be faster
> >>>>> with this now :).
> >>>>>
> >>>>>
> >>>>> Alternatives considered:
> >>>>>
> >>>>>     1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >>>>>        so often. We are missing timers to do this generically.
> >>>>>
> >>>>>     2) Double buffering - We could try to identify whether anything changed
> >>>>>        at all and only draw to the FB if it did. That would require
> >>>>>        maintaining a second buffer that we need to scan.
> >>>>>
> >>>>>     3) Text buffer - Maintain a buffer of all text printed on the screen with
> >>>>>        respective location. Don't write if the old and new character are
> >>>>>        identical. This would limit applicability to text only and is an
> >>>>>        optimization on top of this patch set.
> >>>>>
> >>>>>     4) Hash screen lines - Create a hash (sha256?) over every line when it
> >>>>>        changes. Only flush when it does. I'm not sure if this would waste
> >>>>>        more time, memory and cache than the current approach. It would make
> >>>>>        full screen updates much more expensive.
> >>> 5) Fix the bug mentioned above?
> >>>
> >>>> Changes in v5:
> >>>> - Add patch "video: test: Split copy frame buffer check into a function"
> >>>> - Add patch "video: test: Support checking copy frame buffer contents"
> >>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
> >>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>> - Document damage struct and fields in struct video_priv comment
> >>>> - Return void from video_damage()
> >>>> - Fix undeclared priv error in video_sync()
> >>>> - Drop unused headers from video-uclass.c
> >>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>> - Call video_damage() also in video_fill_part()
> >>>> - Use met->baseline instead of priv->baseline
> >>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >>>> - Update console_rotate.c video_damage() calls to pass video tests
> >>>> - Remove mention about not having minimal damage for console_rotate.c
> >>>> - Add patch "video: test: Test video damage tracking via vidconsole"
> >>>> - Document new vdev field in struct efi_gop_obj comment
> >>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >>>> - Fix memmove() calls by removing the extra dev argument
> >>>> - Call video_sync() before checking copy_fb in video tests
> >>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> >>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
> >>>>
> >>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
> >>>>
> >>>> Changes in v4:
> >>>> - Move damage clear to patch "dm: video: Add damage tracking API"
> >>>> - Simplify first damage logic
> >>>> - Remove VIDEO_DAMAGE default for ARM
> >>>> - Skip damage on EfiBltVideoToBltBuffer
> >>>> - Add patch "video: Always compile cache flushing code"
> >>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
> >>>>
> >>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
> >>>>
> >>>> Changes in v3:
> >>>> - Adapt to always assume DM is used
> >>>> - Adapt to always assume DM is used
> >>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>>>
> >>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
> >>>>
> >>>> Changes in v2:
> >>>> - Remove ifdefs
> >>>> - Fix ranges in truetype target
> >>>> - Limit rotate to necessary damage
> >>>> - Remove ifdefs from gop
> >>>> - Fix dcache range; we were flushing too much before
> >>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>>>
> >>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
> >>>>
> >>>> Alexander Graf (9):
> >>>>     dm: video: Add damage tracking API
> >>>>     dm: video: Add damage notification on display fills
> >>>>     vidconsole: Add damage notifications to all vidconsole drivers
> >>>>     video: Add damage notification on bmp display
> >>>>     efi_loader: GOP: Add damage notification on BLT
> >>>>     video: Only dcache flush damaged lines
> >>>>     video: Use VIDEO_DAMAGE for VIDEO_COPY
> >>>>     video: Always compile cache flushing code
> >>>>     video: Enable VIDEO_DAMAGE for drivers that need it
> >>>>
> >>>> Alper Nebi Yasak (4):
> >>>>     video: test: Split copy frame buffer check into a function
> >>>>     video: test: Support checking copy frame buffer contents
> >>>>     video: test: Test partial updates of hardware frame buffer
> >>>>     video: test: Test video damage tracking via vidconsole
> >>>>
> >>>>    arch/arm/mach-omap2/omap3/Kconfig |   1 +
> >>>>    arch/arm/mach-sunxi/Kconfig       |   1 +
> >>>>    drivers/video/Kconfig             |  26 +++
> >>>>    drivers/video/console_normal.c    |  27 ++--
> >>>>    drivers/video/console_rotate.c    |  94 +++++++----
> >>>>    drivers/video/console_truetype.c  |  37 +++--
> >>>>    drivers/video/exynos/Kconfig      |   1 +
> >>>>    drivers/video/imx/Kconfig         |   1 +
> >>>>    drivers/video/meson/Kconfig       |   1 +
> >>>>    drivers/video/rockchip/Kconfig    |   1 +
> >>>>    drivers/video/stm32/Kconfig       |   1 +
> >>>>    drivers/video/tegra20/Kconfig     |   1 +
> >>>>    drivers/video/tidss/Kconfig       |   1 +
> >>>>    drivers/video/vidconsole-uclass.c |  16 --
> >>>>    drivers/video/video-uclass.c      | 190 ++++++++++++----------
> >>>>    drivers/video/video_bmp.c         |   7 +-
> >>>>    include/video.h                   |  59 +++----
> >>>>    include/video_console.h           |  52 ------
> >>>>    lib/efi_loader/efi_gop.c          |   7 +
> >>>>    test/dm/video.c                   | 256 ++++++++++++++++++++++++------
> >>>>    20 files changed, 483 insertions(+), 297 deletions(-)
> >>> It is good to see this tidied up into something that can be applied!
> >>>
> >>> I am unsure what is going on with the EFI performance, though. It
> >>> should not flush the cache after every character, only after a new
> >>> line. Is there something wrong in here? If so, we should fix that bug
> >>> first and it should be patch 1 of this series.
> >>
> >> Before I came up with this series, I was trying to identify the UEFI bug
> >> in question as well, because intuition told me surely this is a bug in
> >> UEFI :). Turns out it really isn't this time around.
> > I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> > implementation. Where did you look for the bug?
>
>
> The "real" bug is in grub. But given that it's reasonably simple to work
> around in U-Boot and even with it "fixed" in grub we would still see
> performance benefits from flushing only parts of the screen, I think
> it's worth living with the grub deficiency.

OK thanks for digging into it. I suggest we add a param to
vidconsole_puts() to tell it whether to sync or not, then the EFI code
can indicate this and try to be a bit smarter about it.

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 10/13] video: Only dcache flush damaged lines
  2023-08-21 19:59     ` Alexander Graf
@ 2023-08-21 22:10       ` Simon Glass
  2023-08-21 22:44         ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 22:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alex,

On Mon, 21 Aug 2023 at 13:59, Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 21.08.23 21:11, Simon Glass wrote:
> > Hi Alper,
> >
> > On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >> From: Alexander Graf <agraf@csgraf.de>
> >>
> >> Now that we have a damage area tells us which parts of the frame buffer
> >> actually need updating, let's only dcache flush those on video_sync()
> >> calls. With this optimization in place, frame buffer updates - especially
> >> on large screen such as 4k displays - speed up significantly.
> >>
> >> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> >> Reported-by: Da Xue <da@libre.computer>
> >> [Alper: Use damage.xstart/yend, IS_ENABLED()]
> >> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> >> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> >> ---
> >>
> >> Changes in v5:
> >> - Use xstart, ystart, xend, yend as names for damage region
> >> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>
> >> Changes in v2:
> >> - Fix dcache range; we were flushing too much before
> >> - Remove ifdefs
> >>
> >>   drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
> >>   1 file changed, 36 insertions(+), 5 deletions(-)
> > This is a little strange, since flushing the whole cache will only
> > actually write out data that was actually written (to the display). Is
> > there a benefit to this patch, in terms of performance?
>
>
> I'm happy to see you go through the same thought process I went through
> when writing these: "This surely can't be the problem, can it?". The
> answer is "simple" in hindsight:
>
> Have a look at the ARMv8 cache flush function. It does the only "safe"
> thing you can expect it to do: Clean+Invalidate to POC because we use it
> for multiple things, clearing modified code among others:
>
> ENTRY(__asm_flush_dcache_range)
>          mrs     x3, ctr_el0
>          ubfx    x3, x3, #16, #4
>          mov     x2, #4
>          lsl     x2, x2, x3              /* cache line size */
>
>          /* x2 <- minimal cache line size in cache system */
>          sub     x3, x2, #1
>          bic     x0, x0, x3
> 1:      dc      civac, x0       /* clean & invalidate data or unified
> cache */
>          add     x0, x0, x2
>          cmp     x0, x1
>          b.lo    1b
>          dsb     sy
>          ret
> ENDPROC(__asm_flush_dcache_range)
>
>
> Looking at the "dc civac" call, we find this documentation page from
> ARM:
> https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC
>
> This says we're writing any dirtyness of this cache line up to the POC
> and then invalidate (remove the cache line) also up to POC. That means
> when you look at a typical SBC, this will either be L2 or system level
> cache. Every read afterwards needs to go and pull it all the way back to
> L1 to modify it (or not) on the next character write and then flush it
> again.
>
> Even worse: Because of the invalidate, we may even evict it from caches
> that the display controller uses to read the frame buffer. So depending
> on the SoC's cache topology and implementation, we may force the display
> controller to refetch the full FB content on its next screen refresh cycle.
>
> I faintly remember that I tried to experiment with CVAC instead to only
> flush without invalidating. I don't fully remember the results anymore
> though. I believe CVAC just behaved identical to CIVAC on the A53
> platform I was working on. And then I looked at Cortex-A53 errata like
> [1] and just accepted that doing anything but restricting the flushing
> range is a waste of time :)

Yuck I didn't know it was invalidating too. That is horrible. Is there
no way to fix it?

Regards,
Simon

>
>
> Alex
>
>
> [1]
> https://patchwork.kernel.org/project/xen-devel/patch/1462466065-30212-14-git-send-email-julien.grall@arm.com/
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-21 22:10         ` Simon Glass
@ 2023-08-21 22:40           ` Alexander Graf
  2023-08-21 23:03             ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-21 22:40 UTC (permalink / raw)
  To: Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong


On 22.08.23 00:10, Simon Glass wrote:
> Hi Alex,
>
> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
>>
>> On 21.08.23 21:57, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>> Hi Alper,
>>>>>
>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
>>>>>> some tests and other changes. The original cover letter is as follows:
>>>>>>
>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>>>>
>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>>>>> but need it accessible by the display controller which reads directly
>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>>>>> DRAM on every change. The full frame buffer.
>>>>> It should not, see below.
>>>>>
>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>>>>> printed space triggers a cache flush.
>>>>> That is a bug somewhere in EFI.
>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
>>>> over space characters for every character in its menu that it wants
>>>> cleared. On every text screen draw.
>>>>
>>>> This wouldn't be a big issue if we only flush the reactangle that gets
>>>> modified. But without this patch set, we're flushing the full DRAM
>>>> buffer on every u-boot text console character write, which means for
>>>> every character (as that's the only API UEFI has).
>>>>
>>>> As a nice side effect, we speed up the normal U-Boot text console as
>>>> well with this patch set, because even "normal" text prints that write
>>>> for example a single line of text on the screen today flush the full
>>>> frame buffer to DRAM.
>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
>>> after every character. It doesn't do that for normal character output
>>> and I don't think it makes sense to do it for EFI either.
>>
>> I see. Let's trace the calls:
>>
>> efi_cout_output_string()
>> -> fputs()
>> -> vidconsole_puts()
>> -> video_sync()
>> -> flush_dcache_range()
>>
>> Unfortunately grub abstracts character backends down to the "print
>> character" level, so it calls UEFI's sopisticated "output_string"
>> callback with single characters at a time, which means we do a full
>> dcache flush for every character that we print:
>>
>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>
>>
>>>>>>> This patch set implements the easiest mitigation against this problem:
>>>>>>> Damage tracking. We remember the lowest common denominator region that was
>>>>>>> touched since the last video_sync() call and only flush that. The most
>>>>>>> typical writer to the frame buffer is the video console, which always
>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
>>>>>>>
>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>>>>> serial console attached for size information) on an RK3399-ROC system
>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>
>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>>>>>> overhead compared to before as well. So even x86 systems should be faster
>>>>>>> with this now :).
>>>>>>>
>>>>>>>
>>>>>>> Alternatives considered:
>>>>>>>
>>>>>>>      1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>>>>         so often. We are missing timers to do this generically.
>>>>>>>
>>>>>>>      2) Double buffering - We could try to identify whether anything changed
>>>>>>>         at all and only draw to the FB if it did. That would require
>>>>>>>         maintaining a second buffer that we need to scan.
>>>>>>>
>>>>>>>      3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>>>>         respective location. Don't write if the old and new character are
>>>>>>>         identical. This would limit applicability to text only and is an
>>>>>>>         optimization on top of this patch set.
>>>>>>>
>>>>>>>      4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>>>>         changes. Only flush when it does. I'm not sure if this would waste
>>>>>>>         more time, memory and cache than the current approach. It would make
>>>>>>>         full screen updates much more expensive.
>>>>> 5) Fix the bug mentioned above?
>>>>>
>>>>>> Changes in v5:
>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>> - Document damage struct and fields in struct video_priv comment
>>>>>> - Return void from video_damage()
>>>>>> - Fix undeclared priv error in video_sync()
>>>>>> - Drop unused headers from video-uclass.c
>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>> - Call video_damage() also in video_fill_part()
>>>>>> - Use met->baseline instead of priv->baseline
>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
>>>>>> - Remove mention about not having minimal damage for console_rotate.c
>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>
>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>
>>>>>> Changes in v4:
>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
>>>>>> - Simplify first damage logic
>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>>>>>
>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>
>>>>>> Changes in v3:
>>>>>> - Adapt to always assume DM is used
>>>>>> - Adapt to always assume DM is used
>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>
>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>
>>>>>> Changes in v2:
>>>>>> - Remove ifdefs
>>>>>> - Fix ranges in truetype target
>>>>>> - Limit rotate to necessary damage
>>>>>> - Remove ifdefs from gop
>>>>>> - Fix dcache range; we were flushing too much before
>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>
>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>
>>>>>> Alexander Graf (9):
>>>>>>      dm: video: Add damage tracking API
>>>>>>      dm: video: Add damage notification on display fills
>>>>>>      vidconsole: Add damage notifications to all vidconsole drivers
>>>>>>      video: Add damage notification on bmp display
>>>>>>      efi_loader: GOP: Add damage notification on BLT
>>>>>>      video: Only dcache flush damaged lines
>>>>>>      video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>      video: Always compile cache flushing code
>>>>>>      video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>
>>>>>> Alper Nebi Yasak (4):
>>>>>>      video: test: Split copy frame buffer check into a function
>>>>>>      video: test: Support checking copy frame buffer contents
>>>>>>      video: test: Test partial updates of hardware frame buffer
>>>>>>      video: test: Test video damage tracking via vidconsole
>>>>>>
>>>>>>     arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>     arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>>>     drivers/video/Kconfig             |  26 +++
>>>>>>     drivers/video/console_normal.c    |  27 ++--
>>>>>>     drivers/video/console_rotate.c    |  94 +++++++----
>>>>>>     drivers/video/console_truetype.c  |  37 +++--
>>>>>>     drivers/video/exynos/Kconfig      |   1 +
>>>>>>     drivers/video/imx/Kconfig         |   1 +
>>>>>>     drivers/video/meson/Kconfig       |   1 +
>>>>>>     drivers/video/rockchip/Kconfig    |   1 +
>>>>>>     drivers/video/stm32/Kconfig       |   1 +
>>>>>>     drivers/video/tegra20/Kconfig     |   1 +
>>>>>>     drivers/video/tidss/Kconfig       |   1 +
>>>>>>     drivers/video/vidconsole-uclass.c |  16 --
>>>>>>     drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>>>>>     drivers/video/video_bmp.c         |   7 +-
>>>>>>     include/video.h                   |  59 +++----
>>>>>>     include/video_console.h           |  52 ------
>>>>>>     lib/efi_loader/efi_gop.c          |   7 +
>>>>>>     test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>>>>>     20 files changed, 483 insertions(+), 297 deletions(-)
>>>>> It is good to see this tidied up into something that can be applied!
>>>>>
>>>>> I am unsure what is going on with the EFI performance, though. It
>>>>> should not flush the cache after every character, only after a new
>>>>> line. Is there something wrong in here? If so, we should fix that bug
>>>>> first and it should be patch 1 of this series.
>>>> Before I came up with this series, I was trying to identify the UEFI bug
>>>> in question as well, because intuition told me surely this is a bug in
>>>> UEFI :). Turns out it really isn't this time around.
>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>> implementation. Where did you look for the bug?
>>
>> The "real" bug is in grub. But given that it's reasonably simple to work
>> around in U-Boot and even with it "fixed" in grub we would still see
>> performance benefits from flushing only parts of the screen, I think
>> it's worth living with the grub deficiency.
> OK thanks for digging into it. I suggest we add a param to
> vidconsole_puts() to tell it whether to sync or not, then the EFI code
> can indicate this and try to be a bit smarter about it.


It doesn't know when to sync either. From its point of view, any 
"console output" could be the last one. There is no API in UEFI that 
says "please flush console output now".


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 10/13] video: Only dcache flush damaged lines
  2023-08-21 22:10       ` Simon Glass
@ 2023-08-21 22:44         ` Alexander Graf
  2023-08-21 23:03           ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-21 22:44 UTC (permalink / raw)
  To: Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong


On 22.08.23 00:10, Simon Glass wrote:
> Hi Alex,
>
> On Mon, 21 Aug 2023 at 13:59, Alexander Graf <agraf@csgraf.de> wrote:
>>
>> On 21.08.23 21:11, Simon Glass wrote:
>>> Hi Alper,
>>>
>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>> From: Alexander Graf <agraf@csgraf.de>
>>>>
>>>> Now that we have a damage area tells us which parts of the frame buffer
>>>> actually need updating, let's only dcache flush those on video_sync()
>>>> calls. With this optimization in place, frame buffer updates - especially
>>>> on large screen such as 4k displays - speed up significantly.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>>>> Reported-by: Da Xue <da@libre.computer>
>>>> [Alper: Use damage.xstart/yend, IS_ENABLED()]
>>>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>>>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>>>> ---
>>>>
>>>> Changes in v5:
>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>
>>>> Changes in v2:
>>>> - Fix dcache range; we were flushing too much before
>>>> - Remove ifdefs
>>>>
>>>>    drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
>>>>    1 file changed, 36 insertions(+), 5 deletions(-)
>>> This is a little strange, since flushing the whole cache will only
>>> actually write out data that was actually written (to the display). Is
>>> there a benefit to this patch, in terms of performance?
>>
>> I'm happy to see you go through the same thought process I went through
>> when writing these: "This surely can't be the problem, can it?". The
>> answer is "simple" in hindsight:
>>
>> Have a look at the ARMv8 cache flush function. It does the only "safe"
>> thing you can expect it to do: Clean+Invalidate to POC because we use it
>> for multiple things, clearing modified code among others:
>>
>> ENTRY(__asm_flush_dcache_range)
>>           mrs     x3, ctr_el0
>>           ubfx    x3, x3, #16, #4
>>           mov     x2, #4
>>           lsl     x2, x2, x3              /* cache line size */
>>
>>           /* x2 <- minimal cache line size in cache system */
>>           sub     x3, x2, #1
>>           bic     x0, x0, x3
>> 1:      dc      civac, x0       /* clean & invalidate data or unified
>> cache */
>>           add     x0, x0, x2
>>           cmp     x0, x1
>>           b.lo    1b
>>           dsb     sy
>>           ret
>> ENDPROC(__asm_flush_dcache_range)
>>
>>
>> Looking at the "dc civac" call, we find this documentation page from
>> ARM:
>> https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC
>>
>> This says we're writing any dirtyness of this cache line up to the POC
>> and then invalidate (remove the cache line) also up to POC. That means
>> when you look at a typical SBC, this will either be L2 or system level
>> cache. Every read afterwards needs to go and pull it all the way back to
>> L1 to modify it (or not) on the next character write and then flush it
>> again.
>>
>> Even worse: Because of the invalidate, we may even evict it from caches
>> that the display controller uses to read the frame buffer. So depending
>> on the SoC's cache topology and implementation, we may force the display
>> controller to refetch the full FB content on its next screen refresh cycle.
>>
>> I faintly remember that I tried to experiment with CVAC instead to only
>> flush without invalidating. I don't fully remember the results anymore
>> though. I believe CVAC just behaved identical to CIVAC on the A53
>> platform I was working on. And then I looked at Cortex-A53 errata like
>> [1] and just accepted that doing anything but restricting the flushing
>> range is a waste of time :)
> Yuck I didn't know it was invalidating too. That is horrible. Is there
> no way to fix it?


Before building all of this damage logic, I tried, but failed. I'd 
welcome anyone else to try again :). I'm not even convinced yet that it 
is actually fixable: Depending on the SoC's internal cache logic, it may 
opt to always invalidate I think.

That said, this patch set really also makes sense outside of the 
particular invalidate problem. It creates a generic abstraction between 
the copy and non-copy code path and allows us to reduce the amount of 
work spent for both, generically for any video sync operation.


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 10/13] video: Only dcache flush damaged lines
  2023-08-21 22:44         ` Alexander Graf
@ 2023-08-21 23:03           ` Simon Glass
  2023-08-30 19:12             ` Alper Nebi Yasak
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 23:03 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alex,

On Mon, 21 Aug 2023 at 16:44, Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 22.08.23 00:10, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 13:59, Alexander Graf <agraf@csgraf.de> wrote:
> >>
> >> On 21.08.23 21:11, Simon Glass wrote:
> >>> Hi Alper,
> >>>
> >>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >>>> From: Alexander Graf <agraf@csgraf.de>
> >>>>
> >>>> Now that we have a damage area tells us which parts of the frame buffer
> >>>> actually need updating, let's only dcache flush those on video_sync()
> >>>> calls. With this optimization in place, frame buffer updates - especially
> >>>> on large screen such as 4k displays - speed up significantly.
> >>>>
> >>>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> >>>> Reported-by: Da Xue <da@libre.computer>
> >>>> [Alper: Use damage.xstart/yend, IS_ENABLED()]
> >>>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> >>>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> >>>> ---
> >>>>
> >>>> Changes in v5:
> >>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>>
> >>>> Changes in v2:
> >>>> - Fix dcache range; we were flushing too much before
> >>>> - Remove ifdefs
> >>>>
> >>>>    drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
> >>>>    1 file changed, 36 insertions(+), 5 deletions(-)
> >>> This is a little strange, since flushing the whole cache will only
> >>> actually write out data that was actually written (to the display). Is
> >>> there a benefit to this patch, in terms of performance?
> >>
> >> I'm happy to see you go through the same thought process I went through
> >> when writing these: "This surely can't be the problem, can it?". The
> >> answer is "simple" in hindsight:
> >>
> >> Have a look at the ARMv8 cache flush function. It does the only "safe"
> >> thing you can expect it to do: Clean+Invalidate to POC because we use it
> >> for multiple things, clearing modified code among others:
> >>
> >> ENTRY(__asm_flush_dcache_range)
> >>           mrs     x3, ctr_el0
> >>           ubfx    x3, x3, #16, #4
> >>           mov     x2, #4
> >>           lsl     x2, x2, x3              /* cache line size */
> >>
> >>           /* x2 <- minimal cache line size in cache system */
> >>           sub     x3, x2, #1
> >>           bic     x0, x0, x3
> >> 1:      dc      civac, x0       /* clean & invalidate data or unified
> >> cache */
> >>           add     x0, x0, x2
> >>           cmp     x0, x1
> >>           b.lo    1b
> >>           dsb     sy
> >>           ret
> >> ENDPROC(__asm_flush_dcache_range)
> >>
> >>
> >> Looking at the "dc civac" call, we find this documentation page from
> >> ARM:
> >> https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC
> >>
> >> This says we're writing any dirtyness of this cache line up to the POC
> >> and then invalidate (remove the cache line) also up to POC. That means
> >> when you look at a typical SBC, this will either be L2 or system level
> >> cache. Every read afterwards needs to go and pull it all the way back to
> >> L1 to modify it (or not) on the next character write and then flush it
> >> again.
> >>
> >> Even worse: Because of the invalidate, we may even evict it from caches
> >> that the display controller uses to read the frame buffer. So depending
> >> on the SoC's cache topology and implementation, we may force the display
> >> controller to refetch the full FB content on its next screen refresh cycle.
> >>
> >> I faintly remember that I tried to experiment with CVAC instead to only
> >> flush without invalidating. I don't fully remember the results anymore
> >> though. I believe CVAC just behaved identical to CIVAC on the A53
> >> platform I was working on. And then I looked at Cortex-A53 errata like
> >> [1] and just accepted that doing anything but restricting the flushing
> >> range is a waste of time :)
> > Yuck I didn't know it was invalidating too. That is horrible. Is there
> > no way to fix it?
>
>
> Before building all of this damage logic, I tried, but failed. I'd
> welcome anyone else to try again :). I'm not even convinced yet that it
> is actually fixable: Depending on the SoC's internal cache logic, it may
> opt to always invalidate I think.

Wow, that is crazy! How is anyone supposed to make the system run well
with logic like that??!

>
> That said, this patch set really also makes sense outside of the
> particular invalidate problem. It creates a generic abstraction between
> the copy and non-copy code path and allows us to reduce the amount of
> work spent for both, generically for any video sync operation.

Sure...my question was really why it helps so much, given what I
understood the caches to be doing. If they are invalidating, then it
is amazing anything gets done...

Regards,
SImon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-21 22:40           ` Alexander Graf
@ 2023-08-21 23:03             ` Simon Glass
  2023-08-22  7:47               ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-21 23:03 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alex,

On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 22.08.23 00:10, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
> >>
> >> On 21.08.23 21:57, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
> >>>> On 21.08.23 21:11, Simon Glass wrote:
> >>>>> Hi Alper,
> >>>>>
> >>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
> >>>>>> some tests and other changes. The original cover letter is as follows:
> >>>>>>
> >>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>>>>>
> >>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >>>>>>> but need it accessible by the display controller which reads directly
> >>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
> >>>>>>> DRAM on every change. The full frame buffer.
> >>>>> It should not, see below.
> >>>>>
> >>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> >>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
> >>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
> >>>>>>> printed space triggers a cache flush.
> >>>>> That is a bug somewhere in EFI.
> >>>> Unfortunately not :). You may call it a bug in grub: It literally prints
> >>>> over space characters for every character in its menu that it wants
> >>>> cleared. On every text screen draw.
> >>>>
> >>>> This wouldn't be a big issue if we only flush the reactangle that gets
> >>>> modified. But without this patch set, we're flushing the full DRAM
> >>>> buffer on every u-boot text console character write, which means for
> >>>> every character (as that's the only API UEFI has).
> >>>>
> >>>> As a nice side effect, we speed up the normal U-Boot text console as
> >>>> well with this patch set, because even "normal" text prints that write
> >>>> for example a single line of text on the screen today flush the full
> >>>> frame buffer to DRAM.
> >>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> >>> after every character. It doesn't do that for normal character output
> >>> and I don't think it makes sense to do it for EFI either.
> >>
> >> I see. Let's trace the calls:
> >>
> >> efi_cout_output_string()
> >> -> fputs()
> >> -> vidconsole_puts()
> >> -> video_sync()
> >> -> flush_dcache_range()
> >>
> >> Unfortunately grub abstracts character backends down to the "print
> >> character" level, so it calls UEFI's sopisticated "output_string"
> >> callback with single characters at a time, which means we do a full
> >> dcache flush for every character that we print:
> >>
> >> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> >>
> >>
> >>>>>>> This patch set implements the easiest mitigation against this problem:
> >>>>>>> Damage tracking. We remember the lowest common denominator region that was
> >>>>>>> touched since the last video_sync() call and only flush that. The most
> >>>>>>> typical writer to the frame buffer is the video console, which always
> >>>>>>> writes rectangles of characters on the screen and syncs afterwards.
> >>>>>>>
> >>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
> >>>>>>> serial console attached for size information) on an RK3399-ROC system
> >>>>>>> at 1440p from 55 seconds to less than 1 second.
> >>>>>>>
> >>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> >>>>>>> overhead compared to before as well. So even x86 systems should be faster
> >>>>>>> with this now :).
> >>>>>>>
> >>>>>>>
> >>>>>>> Alternatives considered:
> >>>>>>>
> >>>>>>>      1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >>>>>>>         so often. We are missing timers to do this generically.
> >>>>>>>
> >>>>>>>      2) Double buffering - We could try to identify whether anything changed
> >>>>>>>         at all and only draw to the FB if it did. That would require
> >>>>>>>         maintaining a second buffer that we need to scan.
> >>>>>>>
> >>>>>>>      3) Text buffer - Maintain a buffer of all text printed on the screen with
> >>>>>>>         respective location. Don't write if the old and new character are
> >>>>>>>         identical. This would limit applicability to text only and is an
> >>>>>>>         optimization on top of this patch set.
> >>>>>>>
> >>>>>>>      4) Hash screen lines - Create a hash (sha256?) over every line when it
> >>>>>>>         changes. Only flush when it does. I'm not sure if this would waste
> >>>>>>>         more time, memory and cache than the current approach. It would make
> >>>>>>>         full screen updates much more expensive.
> >>>>> 5) Fix the bug mentioned above?
> >>>>>
> >>>>>> Changes in v5:
> >>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
> >>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
> >>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
> >>>>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>>>> - Document damage struct and fields in struct video_priv comment
> >>>>>> - Return void from video_damage()
> >>>>>> - Fix undeclared priv error in video_sync()
> >>>>>> - Drop unused headers from video-uclass.c
> >>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>>>> - Call video_damage() also in video_fill_part()
> >>>>>> - Use met->baseline instead of priv->baseline
> >>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >>>>>> - Update console_rotate.c video_damage() calls to pass video tests
> >>>>>> - Remove mention about not having minimal damage for console_rotate.c
> >>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
> >>>>>> - Document new vdev field in struct efi_gop_obj comment
> >>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >>>>>> - Fix memmove() calls by removing the extra dev argument
> >>>>>> - Call video_sync() before checking copy_fb in video tests
> >>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> >>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
> >>>>>>
> >>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
> >>>>>>
> >>>>>> Changes in v4:
> >>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
> >>>>>> - Simplify first damage logic
> >>>>>> - Remove VIDEO_DAMAGE default for ARM
> >>>>>> - Skip damage on EfiBltVideoToBltBuffer
> >>>>>> - Add patch "video: Always compile cache flushing code"
> >>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
> >>>>>>
> >>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
> >>>>>>
> >>>>>> Changes in v3:
> >>>>>> - Adapt to always assume DM is used
> >>>>>> - Adapt to always assume DM is used
> >>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>>>>>
> >>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
> >>>>>>
> >>>>>> Changes in v2:
> >>>>>> - Remove ifdefs
> >>>>>> - Fix ranges in truetype target
> >>>>>> - Limit rotate to necessary damage
> >>>>>> - Remove ifdefs from gop
> >>>>>> - Fix dcache range; we were flushing too much before
> >>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>>>>>
> >>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
> >>>>>>
> >>>>>> Alexander Graf (9):
> >>>>>>      dm: video: Add damage tracking API
> >>>>>>      dm: video: Add damage notification on display fills
> >>>>>>      vidconsole: Add damage notifications to all vidconsole drivers
> >>>>>>      video: Add damage notification on bmp display
> >>>>>>      efi_loader: GOP: Add damage notification on BLT
> >>>>>>      video: Only dcache flush damaged lines
> >>>>>>      video: Use VIDEO_DAMAGE for VIDEO_COPY
> >>>>>>      video: Always compile cache flushing code
> >>>>>>      video: Enable VIDEO_DAMAGE for drivers that need it
> >>>>>>
> >>>>>> Alper Nebi Yasak (4):
> >>>>>>      video: test: Split copy frame buffer check into a function
> >>>>>>      video: test: Support checking copy frame buffer contents
> >>>>>>      video: test: Test partial updates of hardware frame buffer
> >>>>>>      video: test: Test video damage tracking via vidconsole
> >>>>>>
> >>>>>>     arch/arm/mach-omap2/omap3/Kconfig |   1 +
> >>>>>>     arch/arm/mach-sunxi/Kconfig       |   1 +
> >>>>>>     drivers/video/Kconfig             |  26 +++
> >>>>>>     drivers/video/console_normal.c    |  27 ++--
> >>>>>>     drivers/video/console_rotate.c    |  94 +++++++----
> >>>>>>     drivers/video/console_truetype.c  |  37 +++--
> >>>>>>     drivers/video/exynos/Kconfig      |   1 +
> >>>>>>     drivers/video/imx/Kconfig         |   1 +
> >>>>>>     drivers/video/meson/Kconfig       |   1 +
> >>>>>>     drivers/video/rockchip/Kconfig    |   1 +
> >>>>>>     drivers/video/stm32/Kconfig       |   1 +
> >>>>>>     drivers/video/tegra20/Kconfig     |   1 +
> >>>>>>     drivers/video/tidss/Kconfig       |   1 +
> >>>>>>     drivers/video/vidconsole-uclass.c |  16 --
> >>>>>>     drivers/video/video-uclass.c      | 190 ++++++++++++----------
> >>>>>>     drivers/video/video_bmp.c         |   7 +-
> >>>>>>     include/video.h                   |  59 +++----
> >>>>>>     include/video_console.h           |  52 ------
> >>>>>>     lib/efi_loader/efi_gop.c          |   7 +
> >>>>>>     test/dm/video.c                   | 256 ++++++++++++++++++++++++------
> >>>>>>     20 files changed, 483 insertions(+), 297 deletions(-)
> >>>>> It is good to see this tidied up into something that can be applied!
> >>>>>
> >>>>> I am unsure what is going on with the EFI performance, though. It
> >>>>> should not flush the cache after every character, only after a new
> >>>>> line. Is there something wrong in here? If so, we should fix that bug
> >>>>> first and it should be patch 1 of this series.
> >>>> Before I came up with this series, I was trying to identify the UEFI bug
> >>>> in question as well, because intuition told me surely this is a bug in
> >>>> UEFI :). Turns out it really isn't this time around.
> >>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> >>> implementation. Where did you look for the bug?
> >>
> >> The "real" bug is in grub. But given that it's reasonably simple to work
> >> around in U-Boot and even with it "fixed" in grub we would still see
> >> performance benefits from flushing only parts of the screen, I think
> >> it's worth living with the grub deficiency.
> > OK thanks for digging into it. I suggest we add a param to
> > vidconsole_puts() to tell it whether to sync or not, then the EFI code
> > can indicate this and try to be a bit smarter about it.
>
>
> It doesn't know when to sync either. From its point of view, any
> "console output" could be the last one. There is no API in UEFI that
> says "please flush console output now".

Yes, I understand. I was not suggesting we were missing an API. But
some sort of heuristic would do, e.g. only flush on a newline, flush
every 50 chars, etc.

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-21 23:03             ` Simon Glass
@ 2023-08-22  7:47               ` Alexander Graf
  2023-08-22 18:56                 ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-22  7:47 UTC (permalink / raw)
  To: Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong


On 22.08.23 01:03, Simon Glass wrote:
> Hi Alex,
>
> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
>>
>> On 22.08.23 00:10, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>> Hi Alper,
>>>>>>>
>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
>>>>>>>> some tests and other changes. The original cover letter is as follows:
>>>>>>>>
>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>>>>>>
>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>>>>>>> but need it accessible by the display controller which reads directly
>>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>> It should not, see below.
>>>>>>>
>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>>>>>>> printed space triggers a cache flush.
>>>>>>> That is a bug somewhere in EFI.
>>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
>>>>>> over space characters for every character in its menu that it wants
>>>>>> cleared. On every text screen draw.
>>>>>>
>>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
>>>>>> modified. But without this patch set, we're flushing the full DRAM
>>>>>> buffer on every u-boot text console character write, which means for
>>>>>> every character (as that's the only API UEFI has).
>>>>>>
>>>>>> As a nice side effect, we speed up the normal U-Boot text console as
>>>>>> well with this patch set, because even "normal" text prints that write
>>>>>> for example a single line of text on the screen today flush the full
>>>>>> frame buffer to DRAM.
>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
>>>>> after every character. It doesn't do that for normal character output
>>>>> and I don't think it makes sense to do it for EFI either.
>>>> I see. Let's trace the calls:
>>>>
>>>> efi_cout_output_string()
>>>> -> fputs()
>>>> -> vidconsole_puts()
>>>> -> video_sync()
>>>> -> flush_dcache_range()
>>>>
>>>> Unfortunately grub abstracts character backends down to the "print
>>>> character" level, so it calls UEFI's sopisticated "output_string"
>>>> callback with single characters at a time, which means we do a full
>>>> dcache flush for every character that we print:
>>>>
>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>
>>>>
>>>>>>>>> This patch set implements the easiest mitigation against this problem:
>>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
>>>>>>>>> touched since the last video_sync() call and only flush that. The most
>>>>>>>>> typical writer to the frame buffer is the video console, which always
>>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
>>>>>>>>>
>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>>>>>>> serial console attached for size information) on an RK3399-ROC system
>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>
>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
>>>>>>>>> with this now :).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Alternatives considered:
>>>>>>>>>
>>>>>>>>>       1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>>>>>>          so often. We are missing timers to do this generically.
>>>>>>>>>
>>>>>>>>>       2) Double buffering - We could try to identify whether anything changed
>>>>>>>>>          at all and only draw to the FB if it did. That would require
>>>>>>>>>          maintaining a second buffer that we need to scan.
>>>>>>>>>
>>>>>>>>>       3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>>>>>>          respective location. Don't write if the old and new character are
>>>>>>>>>          identical. This would limit applicability to text only and is an
>>>>>>>>>          optimization on top of this patch set.
>>>>>>>>>
>>>>>>>>>       4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>>>>>>          changes. Only flush when it does. I'm not sure if this would waste
>>>>>>>>>          more time, memory and cache than the current approach. It would make
>>>>>>>>>          full screen updates much more expensive.
>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>
>>>>>>>> Changes in v5:
>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
>>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>> - Document damage struct and fields in struct video_priv comment
>>>>>>>> - Return void from video_damage()
>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
>>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
>>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>
>>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>
>>>>>>>> Changes in v4:
>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
>>>>>>>> - Simplify first damage logic
>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>>>>>>>
>>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>
>>>>>>>> Changes in v3:
>>>>>>>> - Adapt to always assume DM is used
>>>>>>>> - Adapt to always assume DM is used
>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>
>>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>
>>>>>>>> Changes in v2:
>>>>>>>> - Remove ifdefs
>>>>>>>> - Fix ranges in truetype target
>>>>>>>> - Limit rotate to necessary damage
>>>>>>>> - Remove ifdefs from gop
>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>
>>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>
>>>>>>>> Alexander Graf (9):
>>>>>>>>       dm: video: Add damage tracking API
>>>>>>>>       dm: video: Add damage notification on display fills
>>>>>>>>       vidconsole: Add damage notifications to all vidconsole drivers
>>>>>>>>       video: Add damage notification on bmp display
>>>>>>>>       efi_loader: GOP: Add damage notification on BLT
>>>>>>>>       video: Only dcache flush damaged lines
>>>>>>>>       video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>       video: Always compile cache flushing code
>>>>>>>>       video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>
>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>       video: test: Split copy frame buffer check into a function
>>>>>>>>       video: test: Support checking copy frame buffer contents
>>>>>>>>       video: test: Test partial updates of hardware frame buffer
>>>>>>>>       video: test: Test video damage tracking via vidconsole
>>>>>>>>
>>>>>>>>      arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>>>      arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>>>>>      drivers/video/Kconfig             |  26 +++
>>>>>>>>      drivers/video/console_normal.c    |  27 ++--
>>>>>>>>      drivers/video/console_rotate.c    |  94 +++++++----
>>>>>>>>      drivers/video/console_truetype.c  |  37 +++--
>>>>>>>>      drivers/video/exynos/Kconfig      |   1 +
>>>>>>>>      drivers/video/imx/Kconfig         |   1 +
>>>>>>>>      drivers/video/meson/Kconfig       |   1 +
>>>>>>>>      drivers/video/rockchip/Kconfig    |   1 +
>>>>>>>>      drivers/video/stm32/Kconfig       |   1 +
>>>>>>>>      drivers/video/tegra20/Kconfig     |   1 +
>>>>>>>>      drivers/video/tidss/Kconfig       |   1 +
>>>>>>>>      drivers/video/vidconsole-uclass.c |  16 --
>>>>>>>>      drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>>>>>>>      drivers/video/video_bmp.c         |   7 +-
>>>>>>>>      include/video.h                   |  59 +++----
>>>>>>>>      include/video_console.h           |  52 ------
>>>>>>>>      lib/efi_loader/efi_gop.c          |   7 +
>>>>>>>>      test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>>>>>>>      20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>> It is good to see this tidied up into something that can be applied!
>>>>>>>
>>>>>>> I am unsure what is going on with the EFI performance, though. It
>>>>>>> should not flush the cache after every character, only after a new
>>>>>>> line. Is there something wrong in here? If so, we should fix that bug
>>>>>>> first and it should be patch 1 of this series.
>>>>>> Before I came up with this series, I was trying to identify the UEFI bug
>>>>>> in question as well, because intuition told me surely this is a bug in
>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>> implementation. Where did you look for the bug?
>>>> The "real" bug is in grub. But given that it's reasonably simple to work
>>>> around in U-Boot and even with it "fixed" in grub we would still see
>>>> performance benefits from flushing only parts of the screen, I think
>>>> it's worth living with the grub deficiency.
>>> OK thanks for digging into it. I suggest we add a param to
>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
>>> can indicate this and try to be a bit smarter about it.
>>
>> It doesn't know when to sync either. From its point of view, any
>> "console output" could be the last one. There is no API in UEFI that
>> says "please flush console output now".
> Yes, I understand. I was not suggesting we were missing an API. But
> some sort of heuristic would do, e.g. only flush on a newline, flush
> every 50 chars, etc.

I can't think of any heuristic that would reliably work. Relevant for 
this conversation, UEFI provides 2 calls:

   * Write string to screen (efi_cout_output_string)
   * Set text cursor position to X, Y (efi_cout_set_cursor_position)

It's perfectly legal for a UEFI application to do something like

efi_cout_set_cursor_position(10, 10);
efi_cout_output_string("f");
efi_cout_output_string("o");
efi_cout_output_string("o") ;

to update contents of a virtual text box on the screen. Where in this 
chain of events would we call video_sync(), but on every call to 
efi_cout_output_string()?


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-22  7:47               ` Alexander Graf
@ 2023-08-22 18:56                 ` Simon Glass
  2023-08-23  8:56                   ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-22 18:56 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alex,

On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 22.08.23 01:03, Simon Glass wrote:
> > Hi Alex,
> >
> > On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
> >>
> >> On 22.08.23 00:10, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
> >>>> On 21.08.23 21:57, Simon Glass wrote:
> >>>>> Hi Alex,
> >>>>>
> >>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
> >>>>>> On 21.08.23 21:11, Simon Glass wrote:
> >>>>>>> Hi Alper,
> >>>>>>>
> >>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
> >>>>>>>> some tests and other changes. The original cover letter is as follows:
> >>>>>>>>
> >>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>>>>>>>
> >>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >>>>>>>>> but need it accessible by the display controller which reads directly
> >>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
> >>>>>>>>> DRAM on every change. The full frame buffer.
> >>>>>>> It should not, see below.
> >>>>>>>
> >>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> >>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
> >>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
> >>>>>>>>> printed space triggers a cache flush.
> >>>>>>> That is a bug somewhere in EFI.
> >>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
> >>>>>> over space characters for every character in its menu that it wants
> >>>>>> cleared. On every text screen draw.
> >>>>>>
> >>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
> >>>>>> modified. But without this patch set, we're flushing the full DRAM
> >>>>>> buffer on every u-boot text console character write, which means for
> >>>>>> every character (as that's the only API UEFI has).
> >>>>>>
> >>>>>> As a nice side effect, we speed up the normal U-Boot text console as
> >>>>>> well with this patch set, because even "normal" text prints that write
> >>>>>> for example a single line of text on the screen today flush the full
> >>>>>> frame buffer to DRAM.
> >>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> >>>>> after every character. It doesn't do that for normal character output
> >>>>> and I don't think it makes sense to do it for EFI either.
> >>>> I see. Let's trace the calls:
> >>>>
> >>>> efi_cout_output_string()
> >>>> -> fputs()
> >>>> -> vidconsole_puts()
> >>>> -> video_sync()
> >>>> -> flush_dcache_range()
> >>>>
> >>>> Unfortunately grub abstracts character backends down to the "print
> >>>> character" level, so it calls UEFI's sopisticated "output_string"
> >>>> callback with single characters at a time, which means we do a full
> >>>> dcache flush for every character that we print:
> >>>>
> >>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> >>>>
> >>>>
> >>>>>>>>> This patch set implements the easiest mitigation against this problem:
> >>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
> >>>>>>>>> touched since the last video_sync() call and only flush that. The most
> >>>>>>>>> typical writer to the frame buffer is the video console, which always
> >>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
> >>>>>>>>>
> >>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
> >>>>>>>>> serial console attached for size information) on an RK3399-ROC system
> >>>>>>>>> at 1440p from 55 seconds to less than 1 second.
> >>>>>>>>>
> >>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> >>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
> >>>>>>>>> with this now :).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Alternatives considered:
> >>>>>>>>>
> >>>>>>>>>       1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >>>>>>>>>          so often. We are missing timers to do this generically.
> >>>>>>>>>
> >>>>>>>>>       2) Double buffering - We could try to identify whether anything changed
> >>>>>>>>>          at all and only draw to the FB if it did. That would require
> >>>>>>>>>          maintaining a second buffer that we need to scan.
> >>>>>>>>>
> >>>>>>>>>       3) Text buffer - Maintain a buffer of all text printed on the screen with
> >>>>>>>>>          respective location. Don't write if the old and new character are
> >>>>>>>>>          identical. This would limit applicability to text only and is an
> >>>>>>>>>          optimization on top of this patch set.
> >>>>>>>>>
> >>>>>>>>>       4) Hash screen lines - Create a hash (sha256?) over every line when it
> >>>>>>>>>          changes. Only flush when it does. I'm not sure if this would waste
> >>>>>>>>>          more time, memory and cache than the current approach. It would make
> >>>>>>>>>          full screen updates much more expensive.
> >>>>>>> 5) Fix the bug mentioned above?
> >>>>>>>
> >>>>>>>> Changes in v5:
> >>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
> >>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
> >>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
> >>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>>>>>> - Document damage struct and fields in struct video_priv comment
> >>>>>>>> - Return void from video_damage()
> >>>>>>>> - Fix undeclared priv error in video_sync()
> >>>>>>>> - Drop unused headers from video-uclass.c
> >>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>>>>>> - Call video_damage() also in video_fill_part()
> >>>>>>>> - Use met->baseline instead of priv->baseline
> >>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
> >>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
> >>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
> >>>>>>>> - Document new vdev field in struct efi_gop_obj comment
> >>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >>>>>>>> - Fix memmove() calls by removing the extra dev argument
> >>>>>>>> - Call video_sync() before checking copy_fb in video tests
> >>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> >>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
> >>>>>>>>
> >>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
> >>>>>>>>
> >>>>>>>> Changes in v4:
> >>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
> >>>>>>>> - Simplify first damage logic
> >>>>>>>> - Remove VIDEO_DAMAGE default for ARM
> >>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
> >>>>>>>> - Add patch "video: Always compile cache flushing code"
> >>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
> >>>>>>>>
> >>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
> >>>>>>>>
> >>>>>>>> Changes in v3:
> >>>>>>>> - Adapt to always assume DM is used
> >>>>>>>> - Adapt to always assume DM is used
> >>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>>>>>>>
> >>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
> >>>>>>>>
> >>>>>>>> Changes in v2:
> >>>>>>>> - Remove ifdefs
> >>>>>>>> - Fix ranges in truetype target
> >>>>>>>> - Limit rotate to necessary damage
> >>>>>>>> - Remove ifdefs from gop
> >>>>>>>> - Fix dcache range; we were flushing too much before
> >>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>>>>>>>
> >>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
> >>>>>>>>
> >>>>>>>> Alexander Graf (9):
> >>>>>>>>       dm: video: Add damage tracking API
> >>>>>>>>       dm: video: Add damage notification on display fills
> >>>>>>>>       vidconsole: Add damage notifications to all vidconsole drivers
> >>>>>>>>       video: Add damage notification on bmp display
> >>>>>>>>       efi_loader: GOP: Add damage notification on BLT
> >>>>>>>>       video: Only dcache flush damaged lines
> >>>>>>>>       video: Use VIDEO_DAMAGE for VIDEO_COPY
> >>>>>>>>       video: Always compile cache flushing code
> >>>>>>>>       video: Enable VIDEO_DAMAGE for drivers that need it
> >>>>>>>>
> >>>>>>>> Alper Nebi Yasak (4):
> >>>>>>>>       video: test: Split copy frame buffer check into a function
> >>>>>>>>       video: test: Support checking copy frame buffer contents
> >>>>>>>>       video: test: Test partial updates of hardware frame buffer
> >>>>>>>>       video: test: Test video damage tracking via vidconsole
> >>>>>>>>
> >>>>>>>>      arch/arm/mach-omap2/omap3/Kconfig |   1 +
> >>>>>>>>      arch/arm/mach-sunxi/Kconfig       |   1 +
> >>>>>>>>      drivers/video/Kconfig             |  26 +++
> >>>>>>>>      drivers/video/console_normal.c    |  27 ++--
> >>>>>>>>      drivers/video/console_rotate.c    |  94 +++++++----
> >>>>>>>>      drivers/video/console_truetype.c  |  37 +++--
> >>>>>>>>      drivers/video/exynos/Kconfig      |   1 +
> >>>>>>>>      drivers/video/imx/Kconfig         |   1 +
> >>>>>>>>      drivers/video/meson/Kconfig       |   1 +
> >>>>>>>>      drivers/video/rockchip/Kconfig    |   1 +
> >>>>>>>>      drivers/video/stm32/Kconfig       |   1 +
> >>>>>>>>      drivers/video/tegra20/Kconfig     |   1 +
> >>>>>>>>      drivers/video/tidss/Kconfig       |   1 +
> >>>>>>>>      drivers/video/vidconsole-uclass.c |  16 --
> >>>>>>>>      drivers/video/video-uclass.c      | 190 ++++++++++++----------
> >>>>>>>>      drivers/video/video_bmp.c         |   7 +-
> >>>>>>>>      include/video.h                   |  59 +++----
> >>>>>>>>      include/video_console.h           |  52 ------
> >>>>>>>>      lib/efi_loader/efi_gop.c          |   7 +
> >>>>>>>>      test/dm/video.c                   | 256 ++++++++++++++++++++++++------
> >>>>>>>>      20 files changed, 483 insertions(+), 297 deletions(-)
> >>>>>>> It is good to see this tidied up into something that can be applied!
> >>>>>>>
> >>>>>>> I am unsure what is going on with the EFI performance, though. It
> >>>>>>> should not flush the cache after every character, only after a new
> >>>>>>> line. Is there something wrong in here? If so, we should fix that bug
> >>>>>>> first and it should be patch 1 of this series.
> >>>>>> Before I came up with this series, I was trying to identify the UEFI bug
> >>>>>> in question as well, because intuition told me surely this is a bug in
> >>>>>> UEFI :). Turns out it really isn't this time around.
> >>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> >>>>> implementation. Where did you look for the bug?
> >>>> The "real" bug is in grub. But given that it's reasonably simple to work
> >>>> around in U-Boot and even with it "fixed" in grub we would still see
> >>>> performance benefits from flushing only parts of the screen, I think
> >>>> it's worth living with the grub deficiency.
> >>> OK thanks for digging into it. I suggest we add a param to
> >>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
> >>> can indicate this and try to be a bit smarter about it.
> >>
> >> It doesn't know when to sync either. From its point of view, any
> >> "console output" could be the last one. There is no API in UEFI that
> >> says "please flush console output now".
> > Yes, I understand. I was not suggesting we were missing an API. But
> > some sort of heuristic would do, e.g. only flush on a newline, flush
> > every 50 chars, etc.
>
> I can't think of any heuristic that would reliably work. Relevant for
> this conversation, UEFI provides 2 calls:
>
>    * Write string to screen (efi_cout_output_string)
>    * Set text cursor position to X, Y (efi_cout_set_cursor_position)
>
> It's perfectly legal for a UEFI application to do something like
>
> efi_cout_set_cursor_position(10, 10);
> efi_cout_output_string("f");
> efi_cout_output_string("o");
> efi_cout_output_string("o") ;
>
> to update contents of a virtual text box on the screen. Where in this
> chain of events would we call video_sync(), but on every call to
> efi_cout_output_string()?

Actually U-Boot has the same problem, but we have managed to work out something.

I do think it is still to flush the cache on every char. I suspect you
will find that even a simple heuristic like I mentioned would be good
enough.

Also I notice that EFI calls notify? all the time, so U-Boot probably
does have the ability to sync the video every 10ms if we wanted to.

It seems from this discussion that we have made great the enemy of the good.

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-22 18:56                 ` Simon Glass
@ 2023-08-23  8:56                   ` Alexander Graf
  2023-08-28 17:54                     ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-23  8:56 UTC (permalink / raw)
  To: Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hey Simon,

On 22.08.23 20:56, Simon Glass wrote:
> Hi Alex,
>
> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
>>
>> On 22.08.23 01:03, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>> Hi Alper,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
>>>>>>>>>> some tests and other changes. The original cover letter is as follows:
>>>>>>>>>>
>>>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>>>>>>>>
>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>>>>>>>>> but need it accessible by the display controller which reads directly
>>>>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>> It should not, see below.
>>>>>>>>>
>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
>>>>>>>> over space characters for every character in its menu that it wants
>>>>>>>> cleared. On every text screen draw.
>>>>>>>>
>>>>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
>>>>>>>> modified. But without this patch set, we're flushing the full DRAM
>>>>>>>> buffer on every u-boot text console character write, which means for
>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>
>>>>>>>> As a nice side effect, we speed up the normal U-Boot text console as
>>>>>>>> well with this patch set, because even "normal" text prints that write
>>>>>>>> for example a single line of text on the screen today flush the full
>>>>>>>> frame buffer to DRAM.
>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
>>>>>>> after every character. It doesn't do that for normal character output
>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>> I see. Let's trace the calls:
>>>>>>
>>>>>> efi_cout_output_string()
>>>>>> -> fputs()
>>>>>> -> vidconsole_puts()
>>>>>> -> video_sync()
>>>>>> -> flush_dcache_range()
>>>>>>
>>>>>> Unfortunately grub abstracts character backends down to the "print
>>>>>> character" level, so it calls UEFI's sopisticated "output_string"
>>>>>> callback with single characters at a time, which means we do a full
>>>>>> dcache flush for every character that we print:
>>>>>>
>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>>>
>>>>>>
>>>>>>>>>>> This patch set implements the easiest mitigation against this problem:
>>>>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
>>>>>>>>>>> touched since the last video_sync() call and only flush that. The most
>>>>>>>>>>> typical writer to the frame buffer is the video console, which always
>>>>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
>>>>>>>>>>>
>>>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>>>>>>>>> serial console attached for size information) on an RK3399-ROC system
>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>
>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>>>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
>>>>>>>>>>> with this now :).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>
>>>>>>>>>>>        1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>>>>>>>>           so often. We are missing timers to do this generically.
>>>>>>>>>>>
>>>>>>>>>>>        2) Double buffering - We could try to identify whether anything changed
>>>>>>>>>>>           at all and only draw to the FB if it did. That would require
>>>>>>>>>>>           maintaining a second buffer that we need to scan.
>>>>>>>>>>>
>>>>>>>>>>>        3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>>>>>>>>           respective location. Don't write if the old and new character are
>>>>>>>>>>>           identical. This would limit applicability to text only and is an
>>>>>>>>>>>           optimization on top of this patch set.
>>>>>>>>>>>
>>>>>>>>>>>        4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>>>>>>>>           changes. Only flush when it does. I'm not sure if this would waste
>>>>>>>>>>>           more time, memory and cache than the current approach. It would make
>>>>>>>>>>>           full screen updates much more expensive.
>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>
>>>>>>>>>> Changes in v5:
>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
>>>>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>> - Document damage struct and fields in struct video_priv comment
>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
>>>>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
>>>>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>
>>>>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>>>
>>>>>>>>>> Changes in v4:
>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>>>>>>>>>
>>>>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>>>
>>>>>>>>>> Changes in v3:
>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>
>>>>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>>>
>>>>>>>>>> Changes in v2:
>>>>>>>>>> - Remove ifdefs
>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>
>>>>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>>>
>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>        dm: video: Add damage tracking API
>>>>>>>>>>        dm: video: Add damage notification on display fills
>>>>>>>>>>        vidconsole: Add damage notifications to all vidconsole drivers
>>>>>>>>>>        video: Add damage notification on bmp display
>>>>>>>>>>        efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>        video: Only dcache flush damaged lines
>>>>>>>>>>        video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>        video: Always compile cache flushing code
>>>>>>>>>>        video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>
>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>        video: test: Split copy frame buffer check into a function
>>>>>>>>>>        video: test: Support checking copy frame buffer contents
>>>>>>>>>>        video: test: Test partial updates of hardware frame buffer
>>>>>>>>>>        video: test: Test video damage tracking via vidconsole
>>>>>>>>>>
>>>>>>>>>>       arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>>>>>       arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>>>>>>>       drivers/video/Kconfig             |  26 +++
>>>>>>>>>>       drivers/video/console_normal.c    |  27 ++--
>>>>>>>>>>       drivers/video/console_rotate.c    |  94 +++++++----
>>>>>>>>>>       drivers/video/console_truetype.c  |  37 +++--
>>>>>>>>>>       drivers/video/exynos/Kconfig      |   1 +
>>>>>>>>>>       drivers/video/imx/Kconfig         |   1 +
>>>>>>>>>>       drivers/video/meson/Kconfig       |   1 +
>>>>>>>>>>       drivers/video/rockchip/Kconfig    |   1 +
>>>>>>>>>>       drivers/video/stm32/Kconfig       |   1 +
>>>>>>>>>>       drivers/video/tegra20/Kconfig     |   1 +
>>>>>>>>>>       drivers/video/tidss/Kconfig       |   1 +
>>>>>>>>>>       drivers/video/vidconsole-uclass.c |  16 --
>>>>>>>>>>       drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>>>>>>>>>       drivers/video/video_bmp.c         |   7 +-
>>>>>>>>>>       include/video.h                   |  59 +++----
>>>>>>>>>>       include/video_console.h           |  52 ------
>>>>>>>>>>       lib/efi_loader/efi_gop.c          |   7 +
>>>>>>>>>>       test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>>>>>>>>>       20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>> It is good to see this tidied up into something that can be applied!
>>>>>>>>>
>>>>>>>>> I am unsure what is going on with the EFI performance, though. It
>>>>>>>>> should not flush the cache after every character, only after a new
>>>>>>>>> line. Is there something wrong in here? If so, we should fix that bug
>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>> Before I came up with this series, I was trying to identify the UEFI bug
>>>>>>>> in question as well, because intuition told me surely this is a bug in
>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>> implementation. Where did you look for the bug?
>>>>>> The "real" bug is in grub. But given that it's reasonably simple to work
>>>>>> around in U-Boot and even with it "fixed" in grub we would still see
>>>>>> performance benefits from flushing only parts of the screen, I think
>>>>>> it's worth living with the grub deficiency.
>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
>>>>> can indicate this and try to be a bit smarter about it.
>>>> It doesn't know when to sync either. From its point of view, any
>>>> "console output" could be the last one. There is no API in UEFI that
>>>> says "please flush console output now".
>>> Yes, I understand. I was not suggesting we were missing an API. But
>>> some sort of heuristic would do, e.g. only flush on a newline, flush
>>> every 50 chars, etc.
>> I can't think of any heuristic that would reliably work. Relevant for
>> this conversation, UEFI provides 2 calls:
>>
>>     * Write string to screen (efi_cout_output_string)
>>     * Set text cursor position to X, Y (efi_cout_set_cursor_position)
>>
>> It's perfectly legal for a UEFI application to do something like
>>
>> efi_cout_set_cursor_position(10, 10);
>> efi_cout_output_string("f");
>> efi_cout_output_string("o");
>> efi_cout_output_string("o") ;
>>
>> to update contents of a virtual text box on the screen. Where in this
>> chain of events would we call video_sync(), but on every call to
>> efi_cout_output_string()?
> Actually U-Boot has the same problem, but we have managed to work out something.


U-Boot as a code base has a much easier stance: It can add APIs when it 
needs them in places that require them. With UEFI (as well as the U-Boot 
native API), we're stuck with what's there.

I also don't understand what you mean by "we have managed to work out 
something". This patch set is not a UEFI fix - it fixes generic U-Boot 
behavior and speeds up non-UEFI boots as well. The improvement there is 
just not as impressive as with grub :).


> I do think it is still to flush the cache on every char. I suspect you
> will find that even a simple heuristic like I mentioned would be good
> enough.
>
> Also I notice that EFI calls notify? all the time, so U-Boot probably
> does have the ability to sync the video every 10ms if we wanted to.


I fail to see how invalidating the frame buffer for the screen every 
10ms is any better than doing flush+invalidate operations only on screen 
areas that changed? It's more fragile, more difficult to understand and 
also significantly more expensive given that most of the time very 
little on the screen actually changes.


>
> It seems from this discussion that we have made great the enemy of the good.


I agree. Damage tracking in this patch set is elegant, simple, 
predictable, low overhead and basically just abstracts the video copy 
code path to a generic solution. All while it pretty much solves the 
issue for good. I don't understand what's not to like about it :)


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-23  8:56                   ` Alexander Graf
@ 2023-08-28 17:54                     ` Simon Glass
  2023-08-28 20:24                       ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-28 17:54 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alex,

On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf@csgraf.de> wrote:
>
> Hey Simon,
>
> On 22.08.23 20:56, Simon Glass wrote:
> > Hi Alex,
> >
> > On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
> >>
> >> On 22.08.23 01:03, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
> >>>> On 22.08.23 00:10, Simon Glass wrote:
> >>>>> Hi Alex,
> >>>>>
> >>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
> >>>>>> On 21.08.23 21:57, Simon Glass wrote:
> >>>>>>> Hi Alex,
> >>>>>>>
> >>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
> >>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
> >>>>>>>>> Hi Alper,
> >>>>>>>>>
> >>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
> >>>>>>>>>> some tests and other changes. The original cover letter is as follows:
> >>>>>>>>>>
> >>>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>>>>>>>>>
> >>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >>>>>>>>>>> but need it accessible by the display controller which reads directly
> >>>>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
> >>>>>>>>>>> DRAM on every change. The full frame buffer.
> >>>>>>>>> It should not, see below.
> >>>>>>>>>
> >>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> >>>>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
> >>>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
> >>>>>>>>>>> printed space triggers a cache flush.
> >>>>>>>>> That is a bug somewhere in EFI.
> >>>>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
> >>>>>>>> over space characters for every character in its menu that it wants
> >>>>>>>> cleared. On every text screen draw.
> >>>>>>>>
> >>>>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
> >>>>>>>> modified. But without this patch set, we're flushing the full DRAM
> >>>>>>>> buffer on every u-boot text console character write, which means for
> >>>>>>>> every character (as that's the only API UEFI has).
> >>>>>>>>
> >>>>>>>> As a nice side effect, we speed up the normal U-Boot text console as
> >>>>>>>> well with this patch set, because even "normal" text prints that write
> >>>>>>>> for example a single line of text on the screen today flush the full
> >>>>>>>> frame buffer to DRAM.
> >>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> >>>>>>> after every character. It doesn't do that for normal character output
> >>>>>>> and I don't think it makes sense to do it for EFI either.
> >>>>>> I see. Let's trace the calls:
> >>>>>>
> >>>>>> efi_cout_output_string()
> >>>>>> -> fputs()
> >>>>>> -> vidconsole_puts()
> >>>>>> -> video_sync()
> >>>>>> -> flush_dcache_range()
> >>>>>>
> >>>>>> Unfortunately grub abstracts character backends down to the "print
> >>>>>> character" level, so it calls UEFI's sopisticated "output_string"
> >>>>>> callback with single characters at a time, which means we do a full
> >>>>>> dcache flush for every character that we print:
> >>>>>>
> >>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> >>>>>>
> >>>>>>
> >>>>>>>>>>> This patch set implements the easiest mitigation against this problem:
> >>>>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
> >>>>>>>>>>> touched since the last video_sync() call and only flush that. The most
> >>>>>>>>>>> typical writer to the frame buffer is the video console, which always
> >>>>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
> >>>>>>>>>>>
> >>>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
> >>>>>>>>>>> serial console attached for size information) on an RK3399-ROC system
> >>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
> >>>>>>>>>>>
> >>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> >>>>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
> >>>>>>>>>>> with this now :).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Alternatives considered:
> >>>>>>>>>>>
> >>>>>>>>>>>        1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >>>>>>>>>>>           so often. We are missing timers to do this generically.
> >>>>>>>>>>>
> >>>>>>>>>>>        2) Double buffering - We could try to identify whether anything changed
> >>>>>>>>>>>           at all and only draw to the FB if it did. That would require
> >>>>>>>>>>>           maintaining a second buffer that we need to scan.
> >>>>>>>>>>>
> >>>>>>>>>>>        3) Text buffer - Maintain a buffer of all text printed on the screen with
> >>>>>>>>>>>           respective location. Don't write if the old and new character are
> >>>>>>>>>>>           identical. This would limit applicability to text only and is an
> >>>>>>>>>>>           optimization on top of this patch set.
> >>>>>>>>>>>
> >>>>>>>>>>>        4) Hash screen lines - Create a hash (sha256?) over every line when it
> >>>>>>>>>>>           changes. Only flush when it does. I'm not sure if this would waste
> >>>>>>>>>>>           more time, memory and cache than the current approach. It would make
> >>>>>>>>>>>           full screen updates much more expensive.
> >>>>>>>>> 5) Fix the bug mentioned above?
> >>>>>>>>>
> >>>>>>>>>> Changes in v5:
> >>>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
> >>>>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
> >>>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
> >>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>>>>>>>> - Document damage struct and fields in struct video_priv comment
> >>>>>>>>>> - Return void from video_damage()
> >>>>>>>>>> - Fix undeclared priv error in video_sync()
> >>>>>>>>>> - Drop unused headers from video-uclass.c
> >>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>>>>>>>> - Call video_damage() also in video_fill_part()
> >>>>>>>>>> - Use met->baseline instead of priv->baseline
> >>>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >>>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
> >>>>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
> >>>>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
> >>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
> >>>>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
> >>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
> >>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> >>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
> >>>>>>>>>>
> >>>>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
> >>>>>>>>>>
> >>>>>>>>>> Changes in v4:
> >>>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
> >>>>>>>>>> - Simplify first damage logic
> >>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
> >>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
> >>>>>>>>>> - Add patch "video: Always compile cache flushing code"
> >>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
> >>>>>>>>>>
> >>>>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
> >>>>>>>>>>
> >>>>>>>>>> Changes in v3:
> >>>>>>>>>> - Adapt to always assume DM is used
> >>>>>>>>>> - Adapt to always assume DM is used
> >>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>>>>>>>>>
> >>>>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
> >>>>>>>>>>
> >>>>>>>>>> Changes in v2:
> >>>>>>>>>> - Remove ifdefs
> >>>>>>>>>> - Fix ranges in truetype target
> >>>>>>>>>> - Limit rotate to necessary damage
> >>>>>>>>>> - Remove ifdefs from gop
> >>>>>>>>>> - Fix dcache range; we were flushing too much before
> >>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>>>>>>>>>
> >>>>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
> >>>>>>>>>>
> >>>>>>>>>> Alexander Graf (9):
> >>>>>>>>>>        dm: video: Add damage tracking API
> >>>>>>>>>>        dm: video: Add damage notification on display fills
> >>>>>>>>>>        vidconsole: Add damage notifications to all vidconsole drivers
> >>>>>>>>>>        video: Add damage notification on bmp display
> >>>>>>>>>>        efi_loader: GOP: Add damage notification on BLT
> >>>>>>>>>>        video: Only dcache flush damaged lines
> >>>>>>>>>>        video: Use VIDEO_DAMAGE for VIDEO_COPY
> >>>>>>>>>>        video: Always compile cache flushing code
> >>>>>>>>>>        video: Enable VIDEO_DAMAGE for drivers that need it
> >>>>>>>>>>
> >>>>>>>>>> Alper Nebi Yasak (4):
> >>>>>>>>>>        video: test: Split copy frame buffer check into a function
> >>>>>>>>>>        video: test: Support checking copy frame buffer contents
> >>>>>>>>>>        video: test: Test partial updates of hardware frame buffer
> >>>>>>>>>>        video: test: Test video damage tracking via vidconsole
> >>>>>>>>>>
> >>>>>>>>>>       arch/arm/mach-omap2/omap3/Kconfig |   1 +
> >>>>>>>>>>       arch/arm/mach-sunxi/Kconfig       |   1 +
> >>>>>>>>>>       drivers/video/Kconfig             |  26 +++
> >>>>>>>>>>       drivers/video/console_normal.c    |  27 ++--
> >>>>>>>>>>       drivers/video/console_rotate.c    |  94 +++++++----
> >>>>>>>>>>       drivers/video/console_truetype.c  |  37 +++--
> >>>>>>>>>>       drivers/video/exynos/Kconfig      |   1 +
> >>>>>>>>>>       drivers/video/imx/Kconfig         |   1 +
> >>>>>>>>>>       drivers/video/meson/Kconfig       |   1 +
> >>>>>>>>>>       drivers/video/rockchip/Kconfig    |   1 +
> >>>>>>>>>>       drivers/video/stm32/Kconfig       |   1 +
> >>>>>>>>>>       drivers/video/tegra20/Kconfig     |   1 +
> >>>>>>>>>>       drivers/video/tidss/Kconfig       |   1 +
> >>>>>>>>>>       drivers/video/vidconsole-uclass.c |  16 --
> >>>>>>>>>>       drivers/video/video-uclass.c      | 190 ++++++++++++----------
> >>>>>>>>>>       drivers/video/video_bmp.c         |   7 +-
> >>>>>>>>>>       include/video.h                   |  59 +++----
> >>>>>>>>>>       include/video_console.h           |  52 ------
> >>>>>>>>>>       lib/efi_loader/efi_gop.c          |   7 +
> >>>>>>>>>>       test/dm/video.c                   | 256 ++++++++++++++++++++++++------
> >>>>>>>>>>       20 files changed, 483 insertions(+), 297 deletions(-)
> >>>>>>>>> It is good to see this tidied up into something that can be applied!
> >>>>>>>>>
> >>>>>>>>> I am unsure what is going on with the EFI performance, though. It
> >>>>>>>>> should not flush the cache after every character, only after a new
> >>>>>>>>> line. Is there something wrong in here? If so, we should fix that bug
> >>>>>>>>> first and it should be patch 1 of this series.
> >>>>>>>> Before I came up with this series, I was trying to identify the UEFI bug
> >>>>>>>> in question as well, because intuition told me surely this is a bug in
> >>>>>>>> UEFI :). Turns out it really isn't this time around.
> >>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> >>>>>>> implementation. Where did you look for the bug?
> >>>>>> The "real" bug is in grub. But given that it's reasonably simple to work
> >>>>>> around in U-Boot and even with it "fixed" in grub we would still see
> >>>>>> performance benefits from flushing only parts of the screen, I think
> >>>>>> it's worth living with the grub deficiency.
> >>>>> OK thanks for digging into it. I suggest we add a param to
> >>>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
> >>>>> can indicate this and try to be a bit smarter about it.
> >>>> It doesn't know when to sync either. From its point of view, any
> >>>> "console output" could be the last one. There is no API in UEFI that
> >>>> says "please flush console output now".
> >>> Yes, I understand. I was not suggesting we were missing an API. But
> >>> some sort of heuristic would do, e.g. only flush on a newline, flush
> >>> every 50 chars, etc.
> >> I can't think of any heuristic that would reliably work. Relevant for
> >> this conversation, UEFI provides 2 calls:
> >>
> >>     * Write string to screen (efi_cout_output_string)
> >>     * Set text cursor position to X, Y (efi_cout_set_cursor_position)
> >>
> >> It's perfectly legal for a UEFI application to do something like
> >>
> >> efi_cout_set_cursor_position(10, 10);
> >> efi_cout_output_string("f");
> >> efi_cout_output_string("o");
> >> efi_cout_output_string("o") ;
> >>
> >> to update contents of a virtual text box on the screen. Where in this
> >> chain of events would we call video_sync(), but on every call to
> >> efi_cout_output_string()?
> > Actually U-Boot has the same problem, but we have managed to work out something.
>
>
> U-Boot as a code base has a much easier stance: It can add APIs when it
> needs them in places that require them. With UEFI (as well as the U-Boot
> native API), we're stuck with what's there.
>
> I also don't understand what you mean by "we have managed to work out
> something". This patch set is not a UEFI fix - it fixes generic U-Boot
> behavior and speeds up non-UEFI boots as well. The improvement there is
> just not as impressive as with grub :).

We are still not quite on the same page...

U-Boot does have video_sync() but it doesn't know when to call it. If
it does not call it, then any amount of single-threaded code can run
after that, which may update the framebuffer. In other words, U-Boot
is in exactly the same boat as UEFI. It has to decide whether to call
video_sync() based on some sort of heuristic.

That is the only point I am trying to make here. Does that make sense?

>
>
> > I do think it is still to flush the cache on every char. I suspect you
> > will find that even a simple heuristic like I mentioned would be good
> > enough.
> >
> > Also I notice that EFI calls notify? all the time, so U-Boot probably
> > does have the ability to sync the video every 10ms if we wanted to.
>
>
> I fail to see how invalidating the frame buffer for the screen every
> 10ms is any better than doing flush+invalidate operations only on screen
> areas that changed? It's more fragile, more difficult to understand and
> also significantly more expensive given that most of the time very
> little on the screen actually changes.

I am not suggesting it is better, at all. I am just trying to explain
that the U-Boot EFI implementation should not be calling video_sync()
after every character, before or after this series.

>
>
> >
> > It seems from this discussion that we have made great the enemy of the good.
>
>
> I agree. Damage tracking in this patch set is elegant, simple,
> predictable, low overhead and basically just abstracts the video copy
> code path to a generic solution. All while it pretty much solves the
> issue for good. I don't understand what's not to like about it :)

I think it is a really nice feature and I hope it can be applied soon.

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-28 17:54                     ` Simon Glass
@ 2023-08-28 20:24                       ` Alexander Graf
  2023-08-28 21:54                         ` Heinrich Schuchardt
  2023-08-28 22:08                         ` Simon Glass
  0 siblings, 2 replies; 56+ messages in thread
From: Alexander Graf @ 2023-08-28 20:24 UTC (permalink / raw)
  To: Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong


On 28.08.23 19:54, Simon Glass wrote:
> Hi Alex,
>
> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf@csgraf.de> wrote:
>> Hey Simon,
>>
>> On 22.08.23 20:56, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
>>>> On 22.08.23 01:03, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>>>> Hi Alper,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
>>>>>>>>>>>> some tests and other changes. The original cover letter is as follows:
>>>>>>>>>>>>
>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>>>>>>>>>>> but need it accessible by the display controller which reads directly
>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>>>> It should not, see below.
>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>>>>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
>>>>>>>>>> over space characters for every character in its menu that it wants
>>>>>>>>>> cleared. On every text screen draw.
>>>>>>>>>>
>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
>>>>>>>>>> modified. But without this patch set, we're flushing the full DRAM
>>>>>>>>>> buffer on every u-boot text console character write, which means for
>>>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>>>
>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text console as
>>>>>>>>>> well with this patch set, because even "normal" text prints that write
>>>>>>>>>> for example a single line of text on the screen today flush the full
>>>>>>>>>> frame buffer to DRAM.
>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
>>>>>>>>> after every character. It doesn't do that for normal character output
>>>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>>>> I see. Let's trace the calls:
>>>>>>>>
>>>>>>>> efi_cout_output_string()
>>>>>>>> -> fputs()
>>>>>>>> -> vidconsole_puts()
>>>>>>>> -> video_sync()
>>>>>>>> -> flush_dcache_range()
>>>>>>>>
>>>>>>>> Unfortunately grub abstracts character backends down to the "print
>>>>>>>> character" level, so it calls UEFI's sopisticated "output_string"
>>>>>>>> callback with single characters at a time, which means we do a full
>>>>>>>> dcache flush for every character that we print:
>>>>>>>>
>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>> This patch set implements the easiest mitigation against this problem:
>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
>>>>>>>>>>>>> touched since the last video_sync() call and only flush that. The most
>>>>>>>>>>>>> typical writer to the frame buffer is the video console, which always
>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
>>>>>>>>>>>>>
>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>>>>>>>>>>> serial console attached for size information) on an RK3399-ROC system
>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
>>>>>>>>>>>>> with this now :).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>>>
>>>>>>>>>>>>>         1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>>>>>>>>>>            so often. We are missing timers to do this generically.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         2) Double buffering - We could try to identify whether anything changed
>>>>>>>>>>>>>            at all and only draw to the FB if it did. That would require
>>>>>>>>>>>>>            maintaining a second buffer that we need to scan.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>>>>>>>>>>            respective location. Don't write if the old and new character are
>>>>>>>>>>>>>            identical. This would limit applicability to text only and is an
>>>>>>>>>>>>>            optimization on top of this patch set.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>>>>>>>>>>            changes. Only flush when it does. I'm not sure if this would waste
>>>>>>>>>>>>>            more time, memory and cache than the current approach. It would make
>>>>>>>>>>>>>            full screen updates much more expensive.
>>>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>>>
>>>>>>>>>>>> Changes in v5:
>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>>>> - Document damage struct and fields in struct video_priv comment
>>>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
>>>>>>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>>>
>>>>>>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>>>>>
>>>>>>>>>>>> Changes in v4:
>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
>>>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>>>>>>>>>>>
>>>>>>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>>>>>
>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>>>
>>>>>>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>>>>>
>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>> - Remove ifdefs
>>>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>>>
>>>>>>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>>>>>
>>>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>>>         dm: video: Add damage tracking API
>>>>>>>>>>>>         dm: video: Add damage notification on display fills
>>>>>>>>>>>>         vidconsole: Add damage notifications to all vidconsole drivers
>>>>>>>>>>>>         video: Add damage notification on bmp display
>>>>>>>>>>>>         efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>>>         video: Only dcache flush damaged lines
>>>>>>>>>>>>         video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>>>         video: Always compile cache flushing code
>>>>>>>>>>>>         video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>>>
>>>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>>>         video: test: Split copy frame buffer check into a function
>>>>>>>>>>>>         video: test: Support checking copy frame buffer contents
>>>>>>>>>>>>         video: test: Test partial updates of hardware frame buffer
>>>>>>>>>>>>         video: test: Test video damage tracking via vidconsole
>>>>>>>>>>>>
>>>>>>>>>>>>        arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>>>>>>>        arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>>>>>>>>>        drivers/video/Kconfig             |  26 +++
>>>>>>>>>>>>        drivers/video/console_normal.c    |  27 ++--
>>>>>>>>>>>>        drivers/video/console_rotate.c    |  94 +++++++----
>>>>>>>>>>>>        drivers/video/console_truetype.c  |  37 +++--
>>>>>>>>>>>>        drivers/video/exynos/Kconfig      |   1 +
>>>>>>>>>>>>        drivers/video/imx/Kconfig         |   1 +
>>>>>>>>>>>>        drivers/video/meson/Kconfig       |   1 +
>>>>>>>>>>>>        drivers/video/rockchip/Kconfig    |   1 +
>>>>>>>>>>>>        drivers/video/stm32/Kconfig       |   1 +
>>>>>>>>>>>>        drivers/video/tegra20/Kconfig     |   1 +
>>>>>>>>>>>>        drivers/video/tidss/Kconfig       |   1 +
>>>>>>>>>>>>        drivers/video/vidconsole-uclass.c |  16 --
>>>>>>>>>>>>        drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>>>>>>>>>>>        drivers/video/video_bmp.c         |   7 +-
>>>>>>>>>>>>        include/video.h                   |  59 +++----
>>>>>>>>>>>>        include/video_console.h           |  52 ------
>>>>>>>>>>>>        lib/efi_loader/efi_gop.c          |   7 +
>>>>>>>>>>>>        test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>>>>>>>>>>>        20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>>>> It is good to see this tidied up into something that can be applied!
>>>>>>>>>>>
>>>>>>>>>>> I am unsure what is going on with the EFI performance, though. It
>>>>>>>>>>> should not flush the cache after every character, only after a new
>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix that bug
>>>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>>>> Before I came up with this series, I was trying to identify the UEFI bug
>>>>>>>>>> in question as well, because intuition told me surely this is a bug in
>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>>>> implementation. Where did you look for the bug?
>>>>>>>> The "real" bug is in grub. But given that it's reasonably simple to work
>>>>>>>> around in U-Boot and even with it "fixed" in grub we would still see
>>>>>>>> performance benefits from flushing only parts of the screen, I think
>>>>>>>> it's worth living with the grub deficiency.
>>>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
>>>>>>> can indicate this and try to be a bit smarter about it.
>>>>>> It doesn't know when to sync either. From its point of view, any
>>>>>> "console output" could be the last one. There is no API in UEFI that
>>>>>> says "please flush console output now".
>>>>> Yes, I understand. I was not suggesting we were missing an API. But
>>>>> some sort of heuristic would do, e.g. only flush on a newline, flush
>>>>> every 50 chars, etc.
>>>> I can't think of any heuristic that would reliably work. Relevant for
>>>> this conversation, UEFI provides 2 calls:
>>>>
>>>>      * Write string to screen (efi_cout_output_string)
>>>>      * Set text cursor position to X, Y (efi_cout_set_cursor_position)
>>>>
>>>> It's perfectly legal for a UEFI application to do something like
>>>>
>>>> efi_cout_set_cursor_position(10, 10);
>>>> efi_cout_output_string("f");
>>>> efi_cout_output_string("o");
>>>> efi_cout_output_string("o") ;
>>>>
>>>> to update contents of a virtual text box on the screen. Where in this
>>>> chain of events would we call video_sync(), but on every call to
>>>> efi_cout_output_string()?
>>> Actually U-Boot has the same problem, but we have managed to work out something.
>>
>> U-Boot as a code base has a much easier stance: It can add APIs when it
>> needs them in places that require them. With UEFI (as well as the U-Boot
>> native API), we're stuck with what's there.
>>
>> I also don't understand what you mean by "we have managed to work out
>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
>> behavior and speeds up non-UEFI boots as well. The improvement there is
>> just not as impressive as with grub :).
> We are still not quite on the same page...
>
> U-Boot does have video_sync() but it doesn't know when to call it. If
> it does not call it, then any amount of single-threaded code can run
> after that, which may update the framebuffer. In other words, U-Boot
> is in exactly the same boat as UEFI. It has to decide whether to call
> video_sync() based on some sort of heuristic.
>
> That is the only point I am trying to make here. Does that make sense?


Oh, I thought you mentioned above that U-Boot is in a better spot or 
"has it solved already". I agree - it's in the same boat and the only 
safe thing it can really do today that is fully cross-platform 
compatible is to call video_sync() after every character.

I don't understand what you mean by "any amount of single-threaded code 
can run after that, which may update the framebuffer". Any framebuffer 
modification is U-Boot internal code which then again can apply 
video_sync() to tell the system "I want what I wrote to screen actually 
be on screen now". I don't think that's necessarily bad design. A bit 
clunky, but we're in a pre-boot environment after all.

Since we're aligned now: What exactly did you refer to with "but we have 
managed to work out something"?


>
>>
>>> I do think it is still to flush the cache on every char. I suspect you
>>> will find that even a simple heuristic like I mentioned would be good
>>> enough.
>>>
>>> Also I notice that EFI calls notify? all the time, so U-Boot probably
>>> does have the ability to sync the video every 10ms if we wanted to.
>>
>> I fail to see how invalidating the frame buffer for the screen every
>> 10ms is any better than doing flush+invalidate operations only on screen
>> areas that changed? It's more fragile, more difficult to understand and
>> also significantly more expensive given that most of the time very
>> little on the screen actually changes.
> I am not suggesting it is better, at all. I am just trying to explain
> that the U-Boot EFI implementation should not be calling video_sync()
> after every character, before or after this series.


Ah, it doesn't :). It just calls the normal U-Boot "write character on 
screen" function. What that does is up to U-Boot internals - and those 
basically today dictate that something needs to call video_sync() to 
make FB modifications actually pop up on screen.


>
>>
>>> It seems from this discussion that we have made great the enemy of the good.
>>
>> I agree. Damage tracking in this patch set is elegant, simple,
>> predictable, low overhead and basically just abstracts the video copy
>> code path to a generic solution. All while it pretty much solves the
>> issue for good. I don't understand what's not to like about it :)
> I think it is a really nice feature and I hope it can be applied soon.


Thanks a lot especially to Alper for picking it up. I had almost 
forgotten about the patch set :)


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-28 20:24                       ` Alexander Graf
@ 2023-08-28 21:54                         ` Heinrich Schuchardt
  2023-08-29  6:20                           ` Alexander Graf
  2023-08-28 22:08                         ` Simon Glass
  1 sibling, 1 reply; 56+ messages in thread
From: Heinrich Schuchardt @ 2023-08-28 21:54 UTC (permalink / raw)
  To: Alexander Graf, Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong, Ilias Apalodimas

On 8/28/23 22:24, Alexander Graf wrote:
>
> On 28.08.23 19:54, Simon Glass wrote:
>> Hi Alex,
>>
>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf@csgraf.de> wrote:
>>> Hey Simon,
>>>
>>> On 22.08.23 20:56, Simon Glass wrote:
>>>> Hi Alex,
>>>>
>>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
>>>>> On 22.08.23 01:03, Simon Glass wrote:
>>>>>> Hi Alex,
>>>>>>
>>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>>>>> Hi Alex,
>>>>>>>>
>>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de>
>>>>>>>> wrote:
>>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>>>>> Hi Alex,
>>>>>>>>>>
>>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de>
>>>>>>>>>> wrote:
>>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>>>>> Hi Alper,
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
>>>>>>>>>>>> <alpernebiyasak@gmail.com> wrote:
>>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking
>>>>>>>>>>>>> series, with
>>>>>>>>>>>>> some tests and other changes. The original cover letter is
>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a
>>>>>>>>>>>>>> factor of 60x.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map
>>>>>>>>>>>>>> it as cached,
>>>>>>>>>>>>>> but need it accessible by the display controller which
>>>>>>>>>>>>>> reads directly
>>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the
>>>>>>>>>>>>>> frame buffer to
>>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>>>>> It should not, see below.
>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are
>>>>>>>>>>>>>> seeing frame buffers
>>>>>>>>>>>>>> that can take a while to flush out. This was reported by
>>>>>>>>>>>>>> Da Xue with grub,
>>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw
>>>>>>>>>>>>>> a menu. Every
>>>>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It
>>>>>>>>>>> literally prints
>>>>>>>>>>> over space characters for every character in its menu that it
>>>>>>>>>>> wants
>>>>>>>>>>> cleared. On every text screen draw.
>>>>>>>>>>>
>>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle
>>>>>>>>>>> that gets
>>>>>>>>>>> modified. But without this patch set, we're flushing the full
>>>>>>>>>>> DRAM
>>>>>>>>>>> buffer on every u-boot text console character write, which
>>>>>>>>>>> means for
>>>>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>>>>
>>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text
>>>>>>>>>>> console as
>>>>>>>>>>> well with this patch set, because even "normal" text prints
>>>>>>>>>>> that write
>>>>>>>>>>> for example a single line of text on the screen today flush
>>>>>>>>>>> the full
>>>>>>>>>>> frame buffer to DRAM.
>>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes
>>>>>>>>>> the cache
>>>>>>>>>> after every character. It doesn't do that for normal character
>>>>>>>>>> output
>>>>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>>>>> I see. Let's trace the calls:
>>>>>>>>>
>>>>>>>>> efi_cout_output_string()
>>>>>>>>> -> fputs()
>>>>>>>>> -> vidconsole_puts()
>>>>>>>>> -> video_sync()
>>>>>>>>> -> flush_dcache_range()
>>>>>>>>>
>>>>>>>>> Unfortunately grub abstracts character backends down to the "print
>>>>>>>>> character" level, so it calls UEFI's sopisticated "output_string"
>>>>>>>>> callback with single characters at a time, which means we do a
>>>>>>>>> full
>>>>>>>>> dcache flush for every character that we print:
>>>>>>>>>
>>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> This patch set implements the easiest mitigation against
>>>>>>>>>>>>>> this problem:
>>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator
>>>>>>>>>>>>>> region that was
>>>>>>>>>>>>>> touched since the last video_sync() call and only flush
>>>>>>>>>>>>>> that. The most
>>>>>>>>>>>>>> typical writer to the frame buffer is the video console,
>>>>>>>>>>>>>> which always
>>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs
>>>>>>>>>>>>>> afterwards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large
>>>>>>>>>>>>>> grub menu (with
>>>>>>>>>>>>>> serial console attached for size information) on an
>>>>>>>>>>>>>> RK3399-ROC system
>>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism,
>>>>>>>>>>>>>> reducing its
>>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems
>>>>>>>>>>>>>> should be faster
>>>>>>>>>>>>>> with this now :).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         1) Lazy sync - Sandbox does this. It only calls
>>>>>>>>>>>>>> video_sync(true) ever
>>>>>>>>>>>>>>            so often. We are missing timers to do this
>>>>>>>>>>>>>> generically.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         2) Double buffering - We could try to identify
>>>>>>>>>>>>>> whether anything changed
>>>>>>>>>>>>>>            at all and only draw to the FB if it did. That
>>>>>>>>>>>>>> would require
>>>>>>>>>>>>>>            maintaining a second buffer that we need to scan.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         3) Text buffer - Maintain a buffer of all text
>>>>>>>>>>>>>> printed on the screen with
>>>>>>>>>>>>>>            respective location. Don't write if the old and
>>>>>>>>>>>>>> new character are
>>>>>>>>>>>>>>            identical. This would limit applicability to
>>>>>>>>>>>>>> text only and is an
>>>>>>>>>>>>>>            optimization on top of this patch set.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         4) Hash screen lines - Create a hash (sha256?)
>>>>>>>>>>>>>> over every line when it
>>>>>>>>>>>>>>            changes. Only flush when it does. I'm not sure
>>>>>>>>>>>>>> if this would waste
>>>>>>>>>>>>>>            more time, memory and cache than the current
>>>>>>>>>>>>>> approach. It would make
>>>>>>>>>>>>>>            full screen updates much more expensive.
>>>>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>>>>
>>>>>>>>>>>>> Changes in v5:
>>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check
>>>>>>>>>>>>> into a function"
>>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame
>>>>>>>>>>>>> buffer contents"
>>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware
>>>>>>>>>>>>> frame buffer"
>>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>>>>> - Document damage struct and fields in struct video_priv
>>>>>>>>>>>>> comment
>>>>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>>>>> - Use fontdata->height/width instead of
>>>>>>>>>>>>> VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass
>>>>>>>>>>>>> video tests
>>>>>>>>>>>>> - Remove mention about not having minimal damage for
>>>>>>>>>>>>> console_rotate.c
>>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via
>>>>>>>>>>>>> vidconsole"
>>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(),
>>>>>>>>>>>>> video_fill_part()
>>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>>>>
>>>>>>>>>>>>> v4:
>>>>>>>>>>>>> https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Changes in v4:
>>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage
>>>>>>>>>>>>> tracking API"
>>>>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that
>>>>>>>>>>>>> need it"
>>>>>>>>>>>>>
>>>>>>>>>>>>> v3:
>>>>>>>>>>>>> https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>>>>
>>>>>>>>>>>>> v2:
>>>>>>>>>>>>> https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>> - Remove ifdefs
>>>>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>>>>
>>>>>>>>>>>>> v1:
>>>>>>>>>>>>> https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>>>>         dm: video: Add damage tracking API
>>>>>>>>>>>>>         dm: video: Add damage notification on display fills
>>>>>>>>>>>>>         vidconsole: Add damage notifications to all
>>>>>>>>>>>>> vidconsole drivers
>>>>>>>>>>>>>         video: Add damage notification on bmp display
>>>>>>>>>>>>>         efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>>>>         video: Only dcache flush damaged lines
>>>>>>>>>>>>>         video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>>>>         video: Always compile cache flushing code
>>>>>>>>>>>>>         video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>>>>         video: test: Split copy frame buffer check into a
>>>>>>>>>>>>> function
>>>>>>>>>>>>>         video: test: Support checking copy frame buffer
>>>>>>>>>>>>> contents
>>>>>>>>>>>>>         video: test: Test partial updates of hardware frame
>>>>>>>>>>>>> buffer
>>>>>>>>>>>>>         video: test: Test video damage tracking via vidconsole
>>>>>>>>>>>>>
>>>>>>>>>>>>>        arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>>>>>>>>        arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>>>>>>>>>>        drivers/video/Kconfig             |  26 +++
>>>>>>>>>>>>>        drivers/video/console_normal.c    |  27 ++--
>>>>>>>>>>>>>        drivers/video/console_rotate.c    |  94 +++++++----
>>>>>>>>>>>>>        drivers/video/console_truetype.c  |  37 +++--
>>>>>>>>>>>>>        drivers/video/exynos/Kconfig      |   1 +
>>>>>>>>>>>>>        drivers/video/imx/Kconfig         |   1 +
>>>>>>>>>>>>>        drivers/video/meson/Kconfig       |   1 +
>>>>>>>>>>>>>        drivers/video/rockchip/Kconfig    |   1 +
>>>>>>>>>>>>>        drivers/video/stm32/Kconfig       |   1 +
>>>>>>>>>>>>>        drivers/video/tegra20/Kconfig     |   1 +
>>>>>>>>>>>>>        drivers/video/tidss/Kconfig       |   1 +
>>>>>>>>>>>>>        drivers/video/vidconsole-uclass.c |  16 --
>>>>>>>>>>>>>        drivers/video/video-uclass.c      | 190
>>>>>>>>>>>>> ++++++++++++----------
>>>>>>>>>>>>>        drivers/video/video_bmp.c         |   7 +-
>>>>>>>>>>>>>        include/video.h                   |  59 +++----
>>>>>>>>>>>>>        include/video_console.h           |  52 ------
>>>>>>>>>>>>>        lib/efi_loader/efi_gop.c          |   7 +
>>>>>>>>>>>>>        test/dm/video.c                   | 256
>>>>>>>>>>>>> ++++++++++++++++++++++++------
>>>>>>>>>>>>>        20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>>>>> It is good to see this tidied up into something that can be
>>>>>>>>>>>> applied!
>>>>>>>>>>>>
>>>>>>>>>>>> I am unsure what is going on with the EFI performance,
>>>>>>>>>>>> though. It
>>>>>>>>>>>> should not flush the cache after every character, only after
>>>>>>>>>>>> a new
>>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix
>>>>>>>>>>>> that bug
>>>>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>>>>> Before I came up with this series, I was trying to identify
>>>>>>>>>>> the UEFI bug
>>>>>>>>>>> in question as well, because intuition told me surely this is
>>>>>>>>>>> a bug in
>>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>>>>> implementation. Where did you look for the bug?
>>>>>>>>> The "real" bug is in grub. But given that it's reasonably
>>>>>>>>> simple to work
>>>>>>>>> around in U-Boot and even with it "fixed" in grub we would
>>>>>>>>> still see
>>>>>>>>> performance benefits from flushing only parts of the screen, I
>>>>>>>>> think
>>>>>>>>> it's worth living with the grub deficiency.
>>>>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the
>>>>>>>> EFI code
>>>>>>>> can indicate this and try to be a bit smarter about it.
>>>>>>> It doesn't know when to sync either. From its point of view, any
>>>>>>> "console output" could be the last one. There is no API in UEFI that
>>>>>>> says "please flush console output now".
>>>>>> Yes, I understand. I was not suggesting we were missing an API. But
>>>>>> some sort of heuristic would do, e.g. only flush on a newline, flush
>>>>>> every 50 chars, etc.
>>>>> I can't think of any heuristic that would reliably work. Relevant for
>>>>> this conversation, UEFI provides 2 calls:
>>>>>
>>>>>      * Write string to screen (efi_cout_output_string)
>>>>>      * Set text cursor position to X, Y (efi_cout_set_cursor_position)
>>>>>
>>>>> It's perfectly legal for a UEFI application to do something like
>>>>>
>>>>> efi_cout_set_cursor_position(10, 10);
>>>>> efi_cout_output_string("f");
>>>>> efi_cout_output_string("o");
>>>>> efi_cout_output_string("o") ;
>>>>>
>>>>> to update contents of a virtual text box on the screen. Where in this
>>>>> chain of events would we call video_sync(), but on every call to
>>>>> efi_cout_output_string()?
>>>> Actually U-Boot has the same problem, but we have managed to work
>>>> out something.
>>>
>>> U-Boot as a code base has a much easier stance: It can add APIs when it
>>> needs them in places that require them. With UEFI (as well as the U-Boot
>>> native API), we're stuck with what's there.
>>>
>>> I also don't understand what you mean by "we have managed to work out
>>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
>>> behavior and speeds up non-UEFI boots as well. The improvement there is
>>> just not as impressive as with grub :).
>> We are still not quite on the same page...
>>
>> U-Boot does have video_sync() but it doesn't know when to call it. If
>> it does not call it, then any amount of single-threaded code can run
>> after that, which may update the framebuffer. In other words, U-Boot
>> is in exactly the same boat as UEFI. It has to decide whether to call
>> video_sync() based on some sort of heuristic.
>>
>> That is the only point I am trying to make here. Does that make sense?
>
>
> Oh, I thought you mentioned above that U-Boot is in a better spot or
> "has it solved already". I agree - it's in the same boat and the only
> safe thing it can really do today that is fully cross-platform
> compatible is to call video_sync() after every character.
>
> I don't understand what you mean by "any amount of single-threaded code
> can run after that, which may update the framebuffer". Any framebuffer
> modification is U-Boot internal code which then again can apply
> video_sync() to tell the system "I want what I wrote to screen actually
> be on screen now". I don't think that's necessarily bad design. A bit
> clunky, but we're in a pre-boot environment after all.
>
> Since we're aligned now: What exactly did you refer to with "but we have
> managed to work out something"?

Should we set PixelBltOnly to indicate to UEFI applications that they
are not allowed to directly write to the framebuffer but always have to
use BitBlt? GRUB seems to be using a shadow buffer by default which it
copies via BitBlt.

Best regards

Heinrich

>
>
>>
>>>
>>>> I do think it is still to flush the cache on every char. I suspect you
>>>> will find that even a simple heuristic like I mentioned would be good
>>>> enough.
>>>>
>>>> Also I notice that EFI calls notify? all the time, so U-Boot probably
>>>> does have the ability to sync the video every 10ms if we wanted to.
>>>
>>> I fail to see how invalidating the frame buffer for the screen every
>>> 10ms is any better than doing flush+invalidate operations only on screen
>>> areas that changed? It's more fragile, more difficult to understand and
>>> also significantly more expensive given that most of the time very
>>> little on the screen actually changes.
>> I am not suggesting it is better, at all. I am just trying to explain
>> that the U-Boot EFI implementation should not be calling video_sync()
>> after every character, before or after this series.
>
>
> Ah, it doesn't :). It just calls the normal U-Boot "write character on
> screen" function. What that does is up to U-Boot internals - and those
> basically today dictate that something needs to call video_sync() to
> make FB modifications actually pop up on screen.
>
>
>>
>>>
>>>> It seems from this discussion that we have made great the enemy of
>>>> the good.
>>>
>>> I agree. Damage tracking in this patch set is elegant, simple,
>>> predictable, low overhead and basically just abstracts the video copy
>>> code path to a generic solution. All while it pretty much solves the
>>> issue for good. I don't understand what's not to like about it :)
>> I think it is a really nice feature and I hope it can be applied soon.
>
>
> Thanks a lot especially to Alper for picking it up. I had almost
> forgotten about the patch set :)
>
>
> Alex
>
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-28 20:24                       ` Alexander Graf
  2023-08-28 21:54                         ` Heinrich Schuchardt
@ 2023-08-28 22:08                         ` Simon Glass
  2023-08-29  6:27                           ` Alexander Graf
  1 sibling, 1 reply; 56+ messages in thread
From: Simon Glass @ 2023-08-28 22:08 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alex,

On Mon, 28 Aug 2023 at 14:24, Alexander Graf <agraf@csgraf.de> wrote:
>
>
> On 28.08.23 19:54, Simon Glass wrote:
> > Hi Alex,
> >
> > On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf@csgraf.de> wrote:
> >> Hey Simon,
> >>
> >> On 22.08.23 20:56, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
> >>>> On 22.08.23 01:03, Simon Glass wrote:
> >>>>> Hi Alex,
> >>>>>
> >>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
> >>>>>> On 22.08.23 00:10, Simon Glass wrote:
> >>>>>>> Hi Alex,
> >>>>>>>
> >>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
> >>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
> >>>>>>>>> Hi Alex,
> >>>>>>>>>
> >>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
> >>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
> >>>>>>>>>>> Hi Alper,
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
> >>>>>>>>>>>> some tests and other changes. The original cover letter is as follows:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
> >>>>>>>>>>>>> but need it accessible by the display controller which reads directly
> >>>>>>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
> >>>>>>>>>>>>> DRAM on every change. The full frame buffer.
> >>>>>>>>>>> It should not, see below.
> >>>>>>>>>>>
> >>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
> >>>>>>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
> >>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
> >>>>>>>>>>>>> printed space triggers a cache flush.
> >>>>>>>>>>> That is a bug somewhere in EFI.
> >>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
> >>>>>>>>>> over space characters for every character in its menu that it wants
> >>>>>>>>>> cleared. On every text screen draw.
> >>>>>>>>>>
> >>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
> >>>>>>>>>> modified. But without this patch set, we're flushing the full DRAM
> >>>>>>>>>> buffer on every u-boot text console character write, which means for
> >>>>>>>>>> every character (as that's the only API UEFI has).
> >>>>>>>>>>
> >>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text console as
> >>>>>>>>>> well with this patch set, because even "normal" text prints that write
> >>>>>>>>>> for example a single line of text on the screen today flush the full
> >>>>>>>>>> frame buffer to DRAM.
> >>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
> >>>>>>>>> after every character. It doesn't do that for normal character output
> >>>>>>>>> and I don't think it makes sense to do it for EFI either.
> >>>>>>>> I see. Let's trace the calls:
> >>>>>>>>
> >>>>>>>> efi_cout_output_string()
> >>>>>>>> -> fputs()
> >>>>>>>> -> vidconsole_puts()
> >>>>>>>> -> video_sync()
> >>>>>>>> -> flush_dcache_range()
> >>>>>>>>
> >>>>>>>> Unfortunately grub abstracts character backends down to the "print
> >>>>>>>> character" level, so it calls UEFI's sopisticated "output_string"
> >>>>>>>> callback with single characters at a time, which means we do a full
> >>>>>>>> dcache flush for every character that we print:
> >>>>>>>>
> >>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>> This patch set implements the easiest mitigation against this problem:
> >>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
> >>>>>>>>>>>>> touched since the last video_sync() call and only flush that. The most
> >>>>>>>>>>>>> typical writer to the frame buffer is the video console, which always
> >>>>>>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
> >>>>>>>>>>>>> serial console attached for size information) on an RK3399-ROC system
> >>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
> >>>>>>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
> >>>>>>>>>>>>> with this now :).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Alternatives considered:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
> >>>>>>>>>>>>>            so often. We are missing timers to do this generically.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         2) Double buffering - We could try to identify whether anything changed
> >>>>>>>>>>>>>            at all and only draw to the FB if it did. That would require
> >>>>>>>>>>>>>            maintaining a second buffer that we need to scan.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         3) Text buffer - Maintain a buffer of all text printed on the screen with
> >>>>>>>>>>>>>            respective location. Don't write if the old and new character are
> >>>>>>>>>>>>>            identical. This would limit applicability to text only and is an
> >>>>>>>>>>>>>            optimization on top of this patch set.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>         4) Hash screen lines - Create a hash (sha256?) over every line when it
> >>>>>>>>>>>>>            changes. Only flush when it does. I'm not sure if this would waste
> >>>>>>>>>>>>>            more time, memory and cache than the current approach. It would make
> >>>>>>>>>>>>>            full screen updates much more expensive.
> >>>>>>>>>>> 5) Fix the bug mentioned above?
> >>>>>>>>>>>
> >>>>>>>>>>>> Changes in v5:
> >>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
> >>>>>>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
> >>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
> >>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>>>>>>>>>> - Document damage struct and fields in struct video_priv comment
> >>>>>>>>>>>> - Return void from video_damage()
> >>>>>>>>>>>> - Fix undeclared priv error in video_sync()
> >>>>>>>>>>>> - Drop unused headers from video-uclass.c
> >>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>>>>>>>>>> - Call video_damage() also in video_fill_part()
> >>>>>>>>>>>> - Use met->baseline instead of priv->baseline
> >>>>>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
> >>>>>>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
> >>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
> >>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
> >>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
> >>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
> >>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
> >>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
> >>>>>>>>>>>>
> >>>>>>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
> >>>>>>>>>>>>
> >>>>>>>>>>>> Changes in v4:
> >>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
> >>>>>>>>>>>> - Simplify first damage logic
> >>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
> >>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
> >>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
> >>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
> >>>>>>>>>>>>
> >>>>>>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
> >>>>>>>>>>>>
> >>>>>>>>>>>> Changes in v3:
> >>>>>>>>>>>> - Adapt to always assume DM is used
> >>>>>>>>>>>> - Adapt to always assume DM is used
> >>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>>>>>>>>>>>
> >>>>>>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
> >>>>>>>>>>>>
> >>>>>>>>>>>> Changes in v2:
> >>>>>>>>>>>> - Remove ifdefs
> >>>>>>>>>>>> - Fix ranges in truetype target
> >>>>>>>>>>>> - Limit rotate to necessary damage
> >>>>>>>>>>>> - Remove ifdefs from gop
> >>>>>>>>>>>> - Fix dcache range; we were flushing too much before
> >>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>>>>>>>>>>>
> >>>>>>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
> >>>>>>>>>>>>
> >>>>>>>>>>>> Alexander Graf (9):
> >>>>>>>>>>>>         dm: video: Add damage tracking API
> >>>>>>>>>>>>         dm: video: Add damage notification on display fills
> >>>>>>>>>>>>         vidconsole: Add damage notifications to all vidconsole drivers
> >>>>>>>>>>>>         video: Add damage notification on bmp display
> >>>>>>>>>>>>         efi_loader: GOP: Add damage notification on BLT
> >>>>>>>>>>>>         video: Only dcache flush damaged lines
> >>>>>>>>>>>>         video: Use VIDEO_DAMAGE for VIDEO_COPY
> >>>>>>>>>>>>         video: Always compile cache flushing code
> >>>>>>>>>>>>         video: Enable VIDEO_DAMAGE for drivers that need it
> >>>>>>>>>>>>
> >>>>>>>>>>>> Alper Nebi Yasak (4):
> >>>>>>>>>>>>         video: test: Split copy frame buffer check into a function
> >>>>>>>>>>>>         video: test: Support checking copy frame buffer contents
> >>>>>>>>>>>>         video: test: Test partial updates of hardware frame buffer
> >>>>>>>>>>>>         video: test: Test video damage tracking via vidconsole
> >>>>>>>>>>>>
> >>>>>>>>>>>>        arch/arm/mach-omap2/omap3/Kconfig |   1 +
> >>>>>>>>>>>>        arch/arm/mach-sunxi/Kconfig       |   1 +
> >>>>>>>>>>>>        drivers/video/Kconfig             |  26 +++
> >>>>>>>>>>>>        drivers/video/console_normal.c    |  27 ++--
> >>>>>>>>>>>>        drivers/video/console_rotate.c    |  94 +++++++----
> >>>>>>>>>>>>        drivers/video/console_truetype.c  |  37 +++--
> >>>>>>>>>>>>        drivers/video/exynos/Kconfig      |   1 +
> >>>>>>>>>>>>        drivers/video/imx/Kconfig         |   1 +
> >>>>>>>>>>>>        drivers/video/meson/Kconfig       |   1 +
> >>>>>>>>>>>>        drivers/video/rockchip/Kconfig    |   1 +
> >>>>>>>>>>>>        drivers/video/stm32/Kconfig       |   1 +
> >>>>>>>>>>>>        drivers/video/tegra20/Kconfig     |   1 +
> >>>>>>>>>>>>        drivers/video/tidss/Kconfig       |   1 +
> >>>>>>>>>>>>        drivers/video/vidconsole-uclass.c |  16 --
> >>>>>>>>>>>>        drivers/video/video-uclass.c      | 190 ++++++++++++----------
> >>>>>>>>>>>>        drivers/video/video_bmp.c         |   7 +-
> >>>>>>>>>>>>        include/video.h                   |  59 +++----
> >>>>>>>>>>>>        include/video_console.h           |  52 ------
> >>>>>>>>>>>>        lib/efi_loader/efi_gop.c          |   7 +
> >>>>>>>>>>>>        test/dm/video.c                   | 256 ++++++++++++++++++++++++------
> >>>>>>>>>>>>        20 files changed, 483 insertions(+), 297 deletions(-)
> >>>>>>>>>>> It is good to see this tidied up into something that can be applied!
> >>>>>>>>>>>
> >>>>>>>>>>> I am unsure what is going on with the EFI performance, though. It
> >>>>>>>>>>> should not flush the cache after every character, only after a new
> >>>>>>>>>>> line. Is there something wrong in here? If so, we should fix that bug
> >>>>>>>>>>> first and it should be patch 1 of this series.
> >>>>>>>>>> Before I came up with this series, I was trying to identify the UEFI bug
> >>>>>>>>>> in question as well, because intuition told me surely this is a bug in
> >>>>>>>>>> UEFI :). Turns out it really isn't this time around.
> >>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> >>>>>>>>> implementation. Where did you look for the bug?
> >>>>>>>> The "real" bug is in grub. But given that it's reasonably simple to work
> >>>>>>>> around in U-Boot and even with it "fixed" in grub we would still see
> >>>>>>>> performance benefits from flushing only parts of the screen, I think
> >>>>>>>> it's worth living with the grub deficiency.
> >>>>>>> OK thanks for digging into it. I suggest we add a param to
> >>>>>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
> >>>>>>> can indicate this and try to be a bit smarter about it.
> >>>>>> It doesn't know when to sync either. From its point of view, any
> >>>>>> "console output" could be the last one. There is no API in UEFI that
> >>>>>> says "please flush console output now".
> >>>>> Yes, I understand. I was not suggesting we were missing an API. But
> >>>>> some sort of heuristic would do, e.g. only flush on a newline, flush
> >>>>> every 50 chars, etc.
> >>>> I can't think of any heuristic that would reliably work. Relevant for
> >>>> this conversation, UEFI provides 2 calls:
> >>>>
> >>>>      * Write string to screen (efi_cout_output_string)
> >>>>      * Set text cursor position to X, Y (efi_cout_set_cursor_position)
> >>>>
> >>>> It's perfectly legal for a UEFI application to do something like
> >>>>
> >>>> efi_cout_set_cursor_position(10, 10);
> >>>> efi_cout_output_string("f");
> >>>> efi_cout_output_string("o");
> >>>> efi_cout_output_string("o") ;
> >>>>
> >>>> to update contents of a virtual text box on the screen. Where in this
> >>>> chain of events would we call video_sync(), but on every call to
> >>>> efi_cout_output_string()?
> >>> Actually U-Boot has the same problem, but we have managed to work out something.
> >>
> >> U-Boot as a code base has a much easier stance: It can add APIs when it
> >> needs them in places that require them. With UEFI (as well as the U-Boot
> >> native API), we're stuck with what's there.
> >>
> >> I also don't understand what you mean by "we have managed to work out
> >> something". This patch set is not a UEFI fix - it fixes generic U-Boot
> >> behavior and speeds up non-UEFI boots as well. The improvement there is
> >> just not as impressive as with grub :).
> > We are still not quite on the same page...
> >
> > U-Boot does have video_sync() but it doesn't know when to call it. If
> > it does not call it, then any amount of single-threaded code can run
> > after that, which may update the framebuffer. In other words, U-Boot
> > is in exactly the same boat as UEFI. It has to decide whether to call
> > video_sync() based on some sort of heuristic.
> >
> > That is the only point I am trying to make here. Does that make sense?
>
>
> Oh, I thought you mentioned above that U-Boot is in a better spot or
> "has it solved already". I agree - it's in the same boat and the only
> safe thing it can really do today that is fully cross-platform
> compatible is to call video_sync() after every character.
>
> I don't understand what you mean by "any amount of single-threaded code
> can run after that, which may update the framebuffer". Any framebuffer
> modification is U-Boot internal code which then again can apply
> video_sync() to tell the system "I want what I wrote to screen actually
> be on screen now". I don't think that's necessarily bad design. A bit
> clunky, but we're in a pre-boot environment after all.
>
> Since we're aligned now: What exactly did you refer to with "but we have
> managed to work out something"?

So now we are on the same page about that point. The next step is my
heuristic point:

vidconsole_putc_xy() - does not call video_sync()
vidconsole_newline() - does

I am simply suggesting that UEFI should do the same thing.

>
>
> >
> >>
> >>> I do think it is still to flush the cache on every char. I suspect you
> >>> will find that even a simple heuristic like I mentioned would be good
> >>> enough.
> >>>
> >>> Also I notice that EFI calls notify? all the time, so U-Boot probably
> >>> does have the ability to sync the video every 10ms if we wanted to.
> >>
> >> I fail to see how invalidating the frame buffer for the screen every
> >> 10ms is any better than doing flush+invalidate operations only on screen
> >> areas that changed? It's more fragile, more difficult to understand and
> >> also significantly more expensive given that most of the time very
> >> little on the screen actually changes.
> > I am not suggesting it is better, at all. I am just trying to explain
> > that the U-Boot EFI implementation should not be calling video_sync()
> > after every character, before or after this series.
>
>
> Ah, it doesn't :). It just calls the normal U-Boot "write character on
> screen" function. What that does is up to U-Boot internals - and those
> basically today dictate that something needs to call video_sync() to
> make FB modifications actually pop up on screen.

Hmmm, so what function is it calling, then?  I think we are getting
closer to the 'fix' I am trying to tease out.

>
>
> >
> >>
> >>> It seems from this discussion that we have made great the enemy of the good.
> >>
> >> I agree. Damage tracking in this patch set is elegant, simple,
> >> predictable, low overhead and basically just abstracts the video copy
> >> code path to a generic solution. All while it pretty much solves the
> >> issue for good. I don't understand what's not to like about it :)
> > I think it is a really nice feature and I hope it can be applied soon.
>
>
> Thanks a lot especially to Alper for picking it up. I had almost
> forgotten about the patch set :)

Yep!

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-28 21:54                         ` Heinrich Schuchardt
@ 2023-08-29  6:20                           ` Alexander Graf
  2023-08-29  9:19                             ` Mark Kettenis
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-29  6:20 UTC (permalink / raw)
  To: Heinrich Schuchardt, Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong


On 28.08.23 23:54, Heinrich Schuchardt wrote:
> On 8/28/23 22:24, Alexander Graf wrote:
>>
>> On 28.08.23 19:54, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf@csgraf.de> wrote:
>>>> Hey Simon,
>>>>
>>>> On 22.08.23 20:56, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>> On 22.08.23 01:03, Simon Glass wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> 
>>>>>>> wrote:
>>>>>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de>
>>>>>>>>> wrote:
>>>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>>>>>> Hi Alper,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
>>>>>>>>>>>>> <alpernebiyasak@gmail.com> wrote:
>>>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking
>>>>>>>>>>>>>> series, with
>>>>>>>>>>>>>> some tests and other changes. The original cover letter is
>>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a
>>>>>>>>>>>>>>> factor of 60x.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map
>>>>>>>>>>>>>>> it as cached,
>>>>>>>>>>>>>>> but need it accessible by the display controller which
>>>>>>>>>>>>>>> reads directly
>>>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the
>>>>>>>>>>>>>>> frame buffer to
>>>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>>>>>> It should not, see below.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are
>>>>>>>>>>>>>>> seeing frame buffers
>>>>>>>>>>>>>>> that can take a while to flush out. This was reported by
>>>>>>>>>>>>>>> Da Xue with grub,
>>>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw
>>>>>>>>>>>>>>> a menu. Every
>>>>>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It
>>>>>>>>>>>> literally prints
>>>>>>>>>>>> over space characters for every character in its menu that it
>>>>>>>>>>>> wants
>>>>>>>>>>>> cleared. On every text screen draw.
>>>>>>>>>>>>
>>>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle
>>>>>>>>>>>> that gets
>>>>>>>>>>>> modified. But without this patch set, we're flushing the full
>>>>>>>>>>>> DRAM
>>>>>>>>>>>> buffer on every u-boot text console character write, which
>>>>>>>>>>>> means for
>>>>>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>>>>>
>>>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text
>>>>>>>>>>>> console as
>>>>>>>>>>>> well with this patch set, because even "normal" text prints
>>>>>>>>>>>> that write
>>>>>>>>>>>> for example a single line of text on the screen today flush
>>>>>>>>>>>> the full
>>>>>>>>>>>> frame buffer to DRAM.
>>>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes
>>>>>>>>>>> the cache
>>>>>>>>>>> after every character. It doesn't do that for normal character
>>>>>>>>>>> output
>>>>>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>>>>>> I see. Let's trace the calls:
>>>>>>>>>>
>>>>>>>>>> efi_cout_output_string()
>>>>>>>>>> -> fputs()
>>>>>>>>>> -> vidconsole_puts()
>>>>>>>>>> -> video_sync()
>>>>>>>>>> -> flush_dcache_range()
>>>>>>>>>>
>>>>>>>>>> Unfortunately grub abstracts character backends down to the 
>>>>>>>>>> "print
>>>>>>>>>> character" level, so it calls UEFI's sopisticated 
>>>>>>>>>> "output_string"
>>>>>>>>>> callback with single characters at a time, which means we do a
>>>>>>>>>> full
>>>>>>>>>> dcache flush for every character that we print:
>>>>>>>>>>
>>>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set implements the easiest mitigation against
>>>>>>>>>>>>>>> this problem:
>>>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator
>>>>>>>>>>>>>>> region that was
>>>>>>>>>>>>>>> touched since the last video_sync() call and only flush
>>>>>>>>>>>>>>> that. The most
>>>>>>>>>>>>>>> typical writer to the frame buffer is the video console,
>>>>>>>>>>>>>>> which always
>>>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs
>>>>>>>>>>>>>>> afterwards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large
>>>>>>>>>>>>>>> grub menu (with
>>>>>>>>>>>>>>> serial console attached for size information) on an
>>>>>>>>>>>>>>> RK3399-ROC system
>>>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism,
>>>>>>>>>>>>>>> reducing its
>>>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems
>>>>>>>>>>>>>>> should be faster
>>>>>>>>>>>>>>> with this now :).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         1) Lazy sync - Sandbox does this. It only calls
>>>>>>>>>>>>>>> video_sync(true) ever
>>>>>>>>>>>>>>>            so often. We are missing timers to do this
>>>>>>>>>>>>>>> generically.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         2) Double buffering - We could try to identify
>>>>>>>>>>>>>>> whether anything changed
>>>>>>>>>>>>>>>            at all and only draw to the FB if it did. That
>>>>>>>>>>>>>>> would require
>>>>>>>>>>>>>>>            maintaining a second buffer that we need to 
>>>>>>>>>>>>>>> scan.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         3) Text buffer - Maintain a buffer of all text
>>>>>>>>>>>>>>> printed on the screen with
>>>>>>>>>>>>>>>            respective location. Don't write if the old and
>>>>>>>>>>>>>>> new character are
>>>>>>>>>>>>>>>            identical. This would limit applicability to
>>>>>>>>>>>>>>> text only and is an
>>>>>>>>>>>>>>>            optimization on top of this patch set.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         4) Hash screen lines - Create a hash (sha256?)
>>>>>>>>>>>>>>> over every line when it
>>>>>>>>>>>>>>>            changes. Only flush when it does. I'm not sure
>>>>>>>>>>>>>>> if this would waste
>>>>>>>>>>>>>>>            more time, memory and cache than the current
>>>>>>>>>>>>>>> approach. It would make
>>>>>>>>>>>>>>>            full screen updates much more expensive.
>>>>>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v5:
>>>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check
>>>>>>>>>>>>>> into a function"
>>>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame
>>>>>>>>>>>>>> buffer contents"
>>>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware
>>>>>>>>>>>>>> frame buffer"
>>>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>>>>>> - Document damage struct and fields in struct video_priv
>>>>>>>>>>>>>> comment
>>>>>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>>>>>> - Use fontdata->height/width instead of
>>>>>>>>>>>>>> VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass
>>>>>>>>>>>>>> video tests
>>>>>>>>>>>>>> - Remove mention about not having minimal damage for
>>>>>>>>>>>>>> console_rotate.c
>>>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via
>>>>>>>>>>>>>> vidconsole"
>>>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(),
>>>>>>>>>>>>>> video_fill_part()
>>>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of 
>>>>>>>>>>>>>> selecting it
>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v4:
>>>>>>>>>>>>>> https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/ 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v4:
>>>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage
>>>>>>>>>>>>>> tracking API"
>>>>>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that
>>>>>>>>>>>>>> need it"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v3:
>>>>>>>>>>>>>> https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/ 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v2:
>>>>>>>>>>>>>> https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/ 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>>> - Remove ifdefs
>>>>>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v1:
>>>>>>>>>>>>>> https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/ 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>>>>>         dm: video: Add damage tracking API
>>>>>>>>>>>>>>         dm: video: Add damage notification on display fills
>>>>>>>>>>>>>>         vidconsole: Add damage notifications to all
>>>>>>>>>>>>>> vidconsole drivers
>>>>>>>>>>>>>>         video: Add damage notification on bmp display
>>>>>>>>>>>>>>         efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>>>>>         video: Only dcache flush damaged lines
>>>>>>>>>>>>>>         video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>>>>>         video: Always compile cache flushing code
>>>>>>>>>>>>>>         video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>>>>>         video: test: Split copy frame buffer check into a
>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>         video: test: Support checking copy frame buffer
>>>>>>>>>>>>>> contents
>>>>>>>>>>>>>>         video: test: Test partial updates of hardware frame
>>>>>>>>>>>>>> buffer
>>>>>>>>>>>>>>         video: test: Test video damage tracking via 
>>>>>>>>>>>>>> vidconsole
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>>>>>>>>>        arch/arm/mach-sunxi/Kconfig |   1 +
>>>>>>>>>>>>>>        drivers/video/Kconfig |  26 +++
>>>>>>>>>>>>>>        drivers/video/console_normal.c |  27 ++--
>>>>>>>>>>>>>>        drivers/video/console_rotate.c |  94 +++++++----
>>>>>>>>>>>>>>        drivers/video/console_truetype.c |  37 +++--
>>>>>>>>>>>>>>        drivers/video/exynos/Kconfig |   1 +
>>>>>>>>>>>>>>        drivers/video/imx/Kconfig |   1 +
>>>>>>>>>>>>>>        drivers/video/meson/Kconfig |   1 +
>>>>>>>>>>>>>>        drivers/video/rockchip/Kconfig |   1 +
>>>>>>>>>>>>>>        drivers/video/stm32/Kconfig |   1 +
>>>>>>>>>>>>>>        drivers/video/tegra20/Kconfig |   1 +
>>>>>>>>>>>>>>        drivers/video/tidss/Kconfig |   1 +
>>>>>>>>>>>>>>        drivers/video/vidconsole-uclass.c |  16 --
>>>>>>>>>>>>>>        drivers/video/video-uclass.c | 190
>>>>>>>>>>>>>> ++++++++++++----------
>>>>>>>>>>>>>>        drivers/video/video_bmp.c |   7 +-
>>>>>>>>>>>>>>        include/video.h |  59 +++----
>>>>>>>>>>>>>>        include/video_console.h |  52 ------
>>>>>>>>>>>>>>        lib/efi_loader/efi_gop.c |   7 +
>>>>>>>>>>>>>>        test/dm/video.c | 256
>>>>>>>>>>>>>> ++++++++++++++++++++++++------
>>>>>>>>>>>>>>        20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>>>>>> It is good to see this tidied up into something that can be
>>>>>>>>>>>>> applied!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am unsure what is going on with the EFI performance,
>>>>>>>>>>>>> though. It
>>>>>>>>>>>>> should not flush the cache after every character, only after
>>>>>>>>>>>>> a new
>>>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix
>>>>>>>>>>>>> that bug
>>>>>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>>>>>> Before I came up with this series, I was trying to identify
>>>>>>>>>>>> the UEFI bug
>>>>>>>>>>>> in question as well, because intuition told me surely this is
>>>>>>>>>>>> a bug in
>>>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>>>>>> implementation. Where did you look for the bug?
>>>>>>>>>> The "real" bug is in grub. But given that it's reasonably
>>>>>>>>>> simple to work
>>>>>>>>>> around in U-Boot and even with it "fixed" in grub we would
>>>>>>>>>> still see
>>>>>>>>>> performance benefits from flushing only parts of the screen, I
>>>>>>>>>> think
>>>>>>>>>> it's worth living with the grub deficiency.
>>>>>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the
>>>>>>>>> EFI code
>>>>>>>>> can indicate this and try to be a bit smarter about it.
>>>>>>>> It doesn't know when to sync either. From its point of view, any
>>>>>>>> "console output" could be the last one. There is no API in UEFI 
>>>>>>>> that
>>>>>>>> says "please flush console output now".
>>>>>>> Yes, I understand. I was not suggesting we were missing an API. But
>>>>>>> some sort of heuristic would do, e.g. only flush on a newline, 
>>>>>>> flush
>>>>>>> every 50 chars, etc.
>>>>>> I can't think of any heuristic that would reliably work. Relevant 
>>>>>> for
>>>>>> this conversation, UEFI provides 2 calls:
>>>>>>
>>>>>>      * Write string to screen (efi_cout_output_string)
>>>>>>      * Set text cursor position to X, Y 
>>>>>> (efi_cout_set_cursor_position)
>>>>>>
>>>>>> It's perfectly legal for a UEFI application to do something like
>>>>>>
>>>>>> efi_cout_set_cursor_position(10, 10);
>>>>>> efi_cout_output_string("f");
>>>>>> efi_cout_output_string("o");
>>>>>> efi_cout_output_string("o") ;
>>>>>>
>>>>>> to update contents of a virtual text box on the screen. Where in 
>>>>>> this
>>>>>> chain of events would we call video_sync(), but on every call to
>>>>>> efi_cout_output_string()?
>>>>> Actually U-Boot has the same problem, but we have managed to work
>>>>> out something.
>>>>
>>>> U-Boot as a code base has a much easier stance: It can add APIs 
>>>> when it
>>>> needs them in places that require them. With UEFI (as well as the 
>>>> U-Boot
>>>> native API), we're stuck with what's there.
>>>>
>>>> I also don't understand what you mean by "we have managed to work out
>>>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
>>>> behavior and speeds up non-UEFI boots as well. The improvement 
>>>> there is
>>>> just not as impressive as with grub :).
>>> We are still not quite on the same page...
>>>
>>> U-Boot does have video_sync() but it doesn't know when to call it. If
>>> it does not call it, then any amount of single-threaded code can run
>>> after that, which may update the framebuffer. In other words, U-Boot
>>> is in exactly the same boat as UEFI. It has to decide whether to call
>>> video_sync() based on some sort of heuristic.
>>>
>>> That is the only point I am trying to make here. Does that make sense?
>>
>>
>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>> "has it solved already". I agree - it's in the same boat and the only
>> safe thing it can really do today that is fully cross-platform
>> compatible is to call video_sync() after every character.
>>
>> I don't understand what you mean by "any amount of single-threaded code
>> can run after that, which may update the framebuffer". Any framebuffer
>> modification is U-Boot internal code which then again can apply
>> video_sync() to tell the system "I want what I wrote to screen actually
>> be on screen now". I don't think that's necessarily bad design. A bit
>> clunky, but we're in a pre-boot environment after all.
>>
>> Since we're aligned now: What exactly did you refer to with "but we have
>> managed to work out something"?
>
> Should we set PixelBltOnly to indicate to UEFI applications that they
> are not allowed to directly write to the framebuffer but always have to
> use BitBlt? GRUB seems to be using a shadow buffer by default which it
> copies via BitBlt.


If we do that, OSs will no longer be able to carry the frame buffer 
address over and continue to use it with to draw on the screen natively 
(like Linux's efifb).

So no, I don't think we should indicate PixelBltOnly. The frame buffer 
is usually available to applications, you just need to adhere to the 
architecture's caching constraints.


Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-28 22:08                         ` Simon Glass
@ 2023-08-29  6:27                           ` Alexander Graf
  2023-08-30 18:27                             ` Alper Nebi Yasak
  0 siblings, 1 reply; 56+ messages in thread
From: Alexander Graf @ 2023-08-29  6:27 UTC (permalink / raw)
  To: Simon Glass
  Cc: Alper Nebi Yasak, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Simon,

On 29.08.23 00:08, Simon Glass wrote:
> Hi Alex,
>
> On Mon, 28 Aug 2023 at 14:24, Alexander Graf <agraf@csgraf.de> wrote:
>>
>> On 28.08.23 19:54, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf@csgraf.de> wrote:
>>>> Hey Simon,
>>>>
>>>> On 22.08.23 20:56, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>> On 22.08.23 01:03, Simon Glass wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>>>>>> Hi Alper,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking series, with
>>>>>>>>>>>>>> some tests and other changes. The original cover letter is as follows:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>>>>>>>>>>>>>> but need it accessible by the display controller which reads directly
>>>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the frame buffer to
>>>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>>>>>> It should not, see below.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>>>>>>>>>>>>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>>>>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It literally prints
>>>>>>>>>>>> over space characters for every character in its menu that it wants
>>>>>>>>>>>> cleared. On every text screen draw.
>>>>>>>>>>>>
>>>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle that gets
>>>>>>>>>>>> modified. But without this patch set, we're flushing the full DRAM
>>>>>>>>>>>> buffer on every u-boot text console character write, which means for
>>>>>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>>>>>
>>>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text console as
>>>>>>>>>>>> well with this patch set, because even "normal" text prints that write
>>>>>>>>>>>> for example a single line of text on the screen today flush the full
>>>>>>>>>>>> frame buffer to DRAM.
>>>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache
>>>>>>>>>>> after every character. It doesn't do that for normal character output
>>>>>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>>>>>> I see. Let's trace the calls:
>>>>>>>>>>
>>>>>>>>>> efi_cout_output_string()
>>>>>>>>>> -> fputs()
>>>>>>>>>> -> vidconsole_puts()
>>>>>>>>>> -> video_sync()
>>>>>>>>>> -> flush_dcache_range()
>>>>>>>>>>
>>>>>>>>>> Unfortunately grub abstracts character backends down to the "print
>>>>>>>>>> character" level, so it calls UEFI's sopisticated "output_string"
>>>>>>>>>> callback with single characters at a time, which means we do a full
>>>>>>>>>> dcache flush for every character that we print:
>>>>>>>>>>
>>>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>> This patch set implements the easiest mitigation against this problem:
>>>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator region that was
>>>>>>>>>>>>>>> touched since the last video_sync() call and only flush that. The most
>>>>>>>>>>>>>>> typical writer to the frame buffer is the video console, which always
>>>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs afterwards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large grub menu (with
>>>>>>>>>>>>>>> serial console attached for size information) on an RK3399-ROC system
>>>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its
>>>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems should be faster
>>>>>>>>>>>>>>> with this now :).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>>>>>>>>>>>>>             so often. We are missing timers to do this generically.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          2) Double buffering - We could try to identify whether anything changed
>>>>>>>>>>>>>>>             at all and only draw to the FB if it did. That would require
>>>>>>>>>>>>>>>             maintaining a second buffer that we need to scan.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>>>>>>>>>>>>>             respective location. Don't write if the old and new character are
>>>>>>>>>>>>>>>             identical. This would limit applicability to text only and is an
>>>>>>>>>>>>>>>             optimization on top of this patch set.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>>>>>>>>>>>>>             changes. Only flush when it does. I'm not sure if this would waste
>>>>>>>>>>>>>>>             more time, memory and cache than the current approach. It would make
>>>>>>>>>>>>>>>             full screen updates much more expensive.
>>>>>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v5:
>>>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check into a function"
>>>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame buffer contents"
>>>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware frame buffer"
>>>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>>>>>> - Document damage struct and fields in struct video_priv comment
>>>>>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass video tests
>>>>>>>>>>>>>> - Remove mention about not having minimal damage for console_rotate.c
>>>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via vidconsole"
>>>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it
>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v4:
>>>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage tracking API"
>>>>>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>>> - Remove ifdefs
>>>>>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>>>>>          dm: video: Add damage tracking API
>>>>>>>>>>>>>>          dm: video: Add damage notification on display fills
>>>>>>>>>>>>>>          vidconsole: Add damage notifications to all vidconsole drivers
>>>>>>>>>>>>>>          video: Add damage notification on bmp display
>>>>>>>>>>>>>>          efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>>>>>          video: Only dcache flush damaged lines
>>>>>>>>>>>>>>          video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>>>>>          video: Always compile cache flushing code
>>>>>>>>>>>>>>          video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>>>>>          video: test: Split copy frame buffer check into a function
>>>>>>>>>>>>>>          video: test: Support checking copy frame buffer contents
>>>>>>>>>>>>>>          video: test: Test partial updates of hardware frame buffer
>>>>>>>>>>>>>>          video: test: Test video damage tracking via vidconsole
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>>>>>>>>>         arch/arm/mach-sunxi/Kconfig       |   1 +
>>>>>>>>>>>>>>         drivers/video/Kconfig             |  26 +++
>>>>>>>>>>>>>>         drivers/video/console_normal.c    |  27 ++--
>>>>>>>>>>>>>>         drivers/video/console_rotate.c    |  94 +++++++----
>>>>>>>>>>>>>>         drivers/video/console_truetype.c  |  37 +++--
>>>>>>>>>>>>>>         drivers/video/exynos/Kconfig      |   1 +
>>>>>>>>>>>>>>         drivers/video/imx/Kconfig         |   1 +
>>>>>>>>>>>>>>         drivers/video/meson/Kconfig       |   1 +
>>>>>>>>>>>>>>         drivers/video/rockchip/Kconfig    |   1 +
>>>>>>>>>>>>>>         drivers/video/stm32/Kconfig       |   1 +
>>>>>>>>>>>>>>         drivers/video/tegra20/Kconfig     |   1 +
>>>>>>>>>>>>>>         drivers/video/tidss/Kconfig       |   1 +
>>>>>>>>>>>>>>         drivers/video/vidconsole-uclass.c |  16 --
>>>>>>>>>>>>>>         drivers/video/video-uclass.c      | 190 ++++++++++++----------
>>>>>>>>>>>>>>         drivers/video/video_bmp.c         |   7 +-
>>>>>>>>>>>>>>         include/video.h                   |  59 +++----
>>>>>>>>>>>>>>         include/video_console.h           |  52 ------
>>>>>>>>>>>>>>         lib/efi_loader/efi_gop.c          |   7 +
>>>>>>>>>>>>>>         test/dm/video.c                   | 256 ++++++++++++++++++++++++------
>>>>>>>>>>>>>>         20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>>>>>> It is good to see this tidied up into something that can be applied!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am unsure what is going on with the EFI performance, though. It
>>>>>>>>>>>>> should not flush the cache after every character, only after a new
>>>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix that bug
>>>>>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>>>>>> Before I came up with this series, I was trying to identify the UEFI bug
>>>>>>>>>>>> in question as well, because intuition told me surely this is a bug in
>>>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>>>>>> implementation. Where did you look for the bug?
>>>>>>>>>> The "real" bug is in grub. But given that it's reasonably simple to work
>>>>>>>>>> around in U-Boot and even with it "fixed" in grub we would still see
>>>>>>>>>> performance benefits from flushing only parts of the screen, I think
>>>>>>>>>> it's worth living with the grub deficiency.
>>>>>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the EFI code
>>>>>>>>> can indicate this and try to be a bit smarter about it.
>>>>>>>> It doesn't know when to sync either. From its point of view, any
>>>>>>>> "console output" could be the last one. There is no API in UEFI that
>>>>>>>> says "please flush console output now".
>>>>>>> Yes, I understand. I was not suggesting we were missing an API. But
>>>>>>> some sort of heuristic would do, e.g. only flush on a newline, flush
>>>>>>> every 50 chars, etc.
>>>>>> I can't think of any heuristic that would reliably work. Relevant for
>>>>>> this conversation, UEFI provides 2 calls:
>>>>>>
>>>>>>       * Write string to screen (efi_cout_output_string)
>>>>>>       * Set text cursor position to X, Y (efi_cout_set_cursor_position)
>>>>>>
>>>>>> It's perfectly legal for a UEFI application to do something like
>>>>>>
>>>>>> efi_cout_set_cursor_position(10, 10);
>>>>>> efi_cout_output_string("f");
>>>>>> efi_cout_output_string("o");
>>>>>> efi_cout_output_string("o") ;
>>>>>>
>>>>>> to update contents of a virtual text box on the screen. Where in this
>>>>>> chain of events would we call video_sync(), but on every call to
>>>>>> efi_cout_output_string()?
>>>>> Actually U-Boot has the same problem, but we have managed to work out something.
>>>> U-Boot as a code base has a much easier stance: It can add APIs when it
>>>> needs them in places that require them. With UEFI (as well as the U-Boot
>>>> native API), we're stuck with what's there.
>>>>
>>>> I also don't understand what you mean by "we have managed to work out
>>>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
>>>> behavior and speeds up non-UEFI boots as well. The improvement there is
>>>> just not as impressive as with grub :).
>>> We are still not quite on the same page...
>>>
>>> U-Boot does have video_sync() but it doesn't know when to call it. If
>>> it does not call it, then any amount of single-threaded code can run
>>> after that, which may update the framebuffer. In other words, U-Boot
>>> is in exactly the same boat as UEFI. It has to decide whether to call
>>> video_sync() based on some sort of heuristic.
>>>
>>> That is the only point I am trying to make here. Does that make sense?
>>
>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>> "has it solved already". I agree - it's in the same boat and the only
>> safe thing it can really do today that is fully cross-platform
>> compatible is to call video_sync() after every character.
>>
>> I don't understand what you mean by "any amount of single-threaded code
>> can run after that, which may update the framebuffer". Any framebuffer
>> modification is U-Boot internal code which then again can apply
>> video_sync() to tell the system "I want what I wrote to screen actually
>> be on screen now". I don't think that's necessarily bad design. A bit
>> clunky, but we're in a pre-boot environment after all.
>>
>> Since we're aligned now: What exactly did you refer to with "but we have
>> managed to work out something"?
> So now we are on the same page about that point. The next step is my
> heuristic point:
>
> vidconsole_putc_xy() - does not call video_sync()
> vidconsole_newline() - does
>
> I am simply suggesting that UEFI should do the same thing.


I think the better analogy is with

vidconsole_puts() - does

and that's exactly the function that the UEFI code calls. The UEFI 
interface is "write this long string to screen". All the UEFI code does 
is call vidconsole_puts() to do that which then (rightfully) calls 
video_sync().

The reason we flush after every character with grub is grub: Grub abuses 
the "write long string to screen" function and instead only writes a 
single character on each call, which then means we flush on every 
character write.


>
>>
>>>>> I do think it is still to flush the cache on every char. I suspect you
>>>>> will find that even a simple heuristic like I mentioned would be good
>>>>> enough.
>>>>>
>>>>> Also I notice that EFI calls notify? all the time, so U-Boot probably
>>>>> does have the ability to sync the video every 10ms if we wanted to.
>>>> I fail to see how invalidating the frame buffer for the screen every
>>>> 10ms is any better than doing flush+invalidate operations only on screen
>>>> areas that changed? It's more fragile, more difficult to understand and
>>>> also significantly more expensive given that most of the time very
>>>> little on the screen actually changes.
>>> I am not suggesting it is better, at all. I am just trying to explain
>>> that the U-Boot EFI implementation should not be calling video_sync()
>>> after every character, before or after this series.
>>
>> Ah, it doesn't :). It just calls the normal U-Boot "write character on
>> screen" function. What that does is up to U-Boot internals - and those
>> basically today dictate that something needs to call video_sync() to
>> make FB modifications actually pop up on screen.
> Hmmm, so what function is it calling, then?  I think we are getting
> closer to the 'fix' I am trying to tease out.


It literally calls vidconsole_puts():

https://github.com/u-boot/u-boot/blob/master/lib/efi_loader/efi_console.c#L185


Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-29  6:20                           ` Alexander Graf
@ 2023-08-29  9:19                             ` Mark Kettenis
  2023-08-30 19:55                               ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Mark Kettenis @ 2023-08-29  9:19 UTC (permalink / raw)
  To: Alexander Graf
  Cc: xypron.glpk, sjg, alpernebiyasak, u-boot, kever.yang, jagan,
	andre.przywara, clamor95, philipp.tomsich, afd, da,
	patrice.chotard, patrick.delaunay, woods.technical, agust,
	uboot-stm32, mbrugger, u-boot-amlogic, ilias.apalodimas,
	neil.armstrong

> Date: Tue, 29 Aug 2023 08:20:49 +0200
> From: Alexander Graf <agraf@csgraf.de>
> 
> On 28.08.23 23:54, Heinrich Schuchardt wrote:
> > On 8/28/23 22:24, Alexander Graf wrote:
> >>
> >> On 28.08.23 19:54, Simon Glass wrote:
> >>> Hi Alex,
> >>>
> >>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf@csgraf.de> wrote:
> >>>> Hey Simon,
> >>>>
> >>>> On 22.08.23 20:56, Simon Glass wrote:
> >>>>> Hi Alex,
> >>>>>
> >>>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
> >>>>>> On 22.08.23 01:03, Simon Glass wrote:
> >>>>>>> Hi Alex,
> >>>>>>>
> >>>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de> 
> >>>>>>> wrote:
> >>>>>>>> On 22.08.23 00:10, Simon Glass wrote:
> >>>>>>>>> Hi Alex,
> >>>>>>>>>
> >>>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de>
> >>>>>>>>> wrote:
> >>>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
> >>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
> >>>>>>>>>>>>> Hi Alper,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
> >>>>>>>>>>>>> <alpernebiyasak@gmail.com> wrote:
> >>>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking
> >>>>>>>>>>>>>> series, with
> >>>>>>>>>>>>>> some tests and other changes. The original cover letter is
> >>>>>>>>>>>>>> as follows:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a
> >>>>>>>>>>>>>>> factor of 60x.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map
> >>>>>>>>>>>>>>> it as cached,
> >>>>>>>>>>>>>>> but need it accessible by the display controller which
> >>>>>>>>>>>>>>> reads directly
> >>>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the
> >>>>>>>>>>>>>>> frame buffer to
> >>>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
> >>>>>>>>>>>>> It should not, see below.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are
> >>>>>>>>>>>>>>> seeing frame buffers
> >>>>>>>>>>>>>>> that can take a while to flush out. This was reported by
> >>>>>>>>>>>>>>> Da Xue with grub,
> >>>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw
> >>>>>>>>>>>>>>> a menu. Every
> >>>>>>>>>>>>>>> printed space triggers a cache flush.
> >>>>>>>>>>>>> That is a bug somewhere in EFI.
> >>>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It
> >>>>>>>>>>>> literally prints
> >>>>>>>>>>>> over space characters for every character in its menu that it
> >>>>>>>>>>>> wants
> >>>>>>>>>>>> cleared. On every text screen draw.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle
> >>>>>>>>>>>> that gets
> >>>>>>>>>>>> modified. But without this patch set, we're flushing the full
> >>>>>>>>>>>> DRAM
> >>>>>>>>>>>> buffer on every u-boot text console character write, which
> >>>>>>>>>>>> means for
> >>>>>>>>>>>> every character (as that's the only API UEFI has).
> >>>>>>>>>>>>
> >>>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text
> >>>>>>>>>>>> console as
> >>>>>>>>>>>> well with this patch set, because even "normal" text prints
> >>>>>>>>>>>> that write
> >>>>>>>>>>>> for example a single line of text on the screen today flush
> >>>>>>>>>>>> the full
> >>>>>>>>>>>> frame buffer to DRAM.
> >>>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes
> >>>>>>>>>>> the cache
> >>>>>>>>>>> after every character. It doesn't do that for normal character
> >>>>>>>>>>> output
> >>>>>>>>>>> and I don't think it makes sense to do it for EFI either.
> >>>>>>>>>> I see. Let's trace the calls:
> >>>>>>>>>>
> >>>>>>>>>> efi_cout_output_string()
> >>>>>>>>>> -> fputs()
> >>>>>>>>>> -> vidconsole_puts()
> >>>>>>>>>> -> video_sync()
> >>>>>>>>>> -> flush_dcache_range()
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately grub abstracts character backends down to the 
> >>>>>>>>>> "print
> >>>>>>>>>> character" level, so it calls UEFI's sopisticated 
> >>>>>>>>>> "output_string"
> >>>>>>>>>> callback with single characters at a time, which means we do a
> >>>>>>>>>> full
> >>>>>>>>>> dcache flush for every character that we print:
> >>>>>>>>>>
> >>>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165 
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>>>> This patch set implements the easiest mitigation against
> >>>>>>>>>>>>>>> this problem:
> >>>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator
> >>>>>>>>>>>>>>> region that was
> >>>>>>>>>>>>>>> touched since the last video_sync() call and only flush
> >>>>>>>>>>>>>>> that. The most
> >>>>>>>>>>>>>>> typical writer to the frame buffer is the video console,
> >>>>>>>>>>>>>>> which always
> >>>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs
> >>>>>>>>>>>>>>> afterwards.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large
> >>>>>>>>>>>>>>> grub menu (with
> >>>>>>>>>>>>>>> serial console attached for size information) on an
> >>>>>>>>>>>>>>> RK3399-ROC system
> >>>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism,
> >>>>>>>>>>>>>>> reducing its
> >>>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems
> >>>>>>>>>>>>>>> should be faster
> >>>>>>>>>>>>>>> with this now :).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Alternatives considered:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>         1) Lazy sync - Sandbox does this. It only calls
> >>>>>>>>>>>>>>> video_sync(true) ever
> >>>>>>>>>>>>>>>            so often. We are missing timers to do this
> >>>>>>>>>>>>>>> generically.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>         2) Double buffering - We could try to identify
> >>>>>>>>>>>>>>> whether anything changed
> >>>>>>>>>>>>>>>            at all and only draw to the FB if it did. That
> >>>>>>>>>>>>>>> would require
> >>>>>>>>>>>>>>>            maintaining a second buffer that we need to 
> >>>>>>>>>>>>>>> scan.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>         3) Text buffer - Maintain a buffer of all text
> >>>>>>>>>>>>>>> printed on the screen with
> >>>>>>>>>>>>>>>            respective location. Don't write if the old and
> >>>>>>>>>>>>>>> new character are
> >>>>>>>>>>>>>>>            identical. This would limit applicability to
> >>>>>>>>>>>>>>> text only and is an
> >>>>>>>>>>>>>>>            optimization on top of this patch set.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>         4) Hash screen lines - Create a hash (sha256?)
> >>>>>>>>>>>>>>> over every line when it
> >>>>>>>>>>>>>>>            changes. Only flush when it does. I'm not sure
> >>>>>>>>>>>>>>> if this would waste
> >>>>>>>>>>>>>>>            more time, memory and cache than the current
> >>>>>>>>>>>>>>> approach. It would make
> >>>>>>>>>>>>>>>            full screen updates much more expensive.
> >>>>>>>>>>>>> 5) Fix the bug mentioned above?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Changes in v5:
> >>>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check
> >>>>>>>>>>>>>> into a function"
> >>>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame
> >>>>>>>>>>>>>> buffer contents"
> >>>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware
> >>>>>>>>>>>>>> frame buffer"
> >>>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
> >>>>>>>>>>>>>> - Document damage struct and fields in struct video_priv
> >>>>>>>>>>>>>> comment
> >>>>>>>>>>>>>> - Return void from video_damage()
> >>>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
> >>>>>>>>>>>>>> - Drop unused headers from video-uclass.c
> >>>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
> >>>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
> >>>>>>>>>>>>>> - Use fontdata->height/width instead of
> >>>>>>>>>>>>>> VIDEO_FONT_HEIGHT/WIDTH
> >>>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass
> >>>>>>>>>>>>>> video tests
> >>>>>>>>>>>>>> - Remove mention about not having minimal damage for
> >>>>>>>>>>>>>> console_rotate.c
> >>>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via
> >>>>>>>>>>>>>> vidconsole"
> >>>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
> >>>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(),
> >>>>>>>>>>>>>> video_fill_part()
> >>>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
> >>>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
> >>>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of 
> >>>>>>>>>>>>>> selecting it
> >>>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> v4:
> >>>>>>>>>>>>>> https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/ 
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Changes in v4:
> >>>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage
> >>>>>>>>>>>>>> tracking API"
> >>>>>>>>>>>>>> - Simplify first damage logic
> >>>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
> >>>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
> >>>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
> >>>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that
> >>>>>>>>>>>>>> need it"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> v3:
> >>>>>>>>>>>>>> https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/ 
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Changes in v3:
> >>>>>>>>>>>>>> - Adapt to always assume DM is used
> >>>>>>>>>>>>>> - Adapt to always assume DM is used
> >>>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> v2:
> >>>>>>>>>>>>>> https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/ 
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Changes in v2:
> >>>>>>>>>>>>>> - Remove ifdefs
> >>>>>>>>>>>>>> - Fix ranges in truetype target
> >>>>>>>>>>>>>> - Limit rotate to necessary damage
> >>>>>>>>>>>>>> - Remove ifdefs from gop
> >>>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
> >>>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> v1:
> >>>>>>>>>>>>>> https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/ 
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Alexander Graf (9):
> >>>>>>>>>>>>>>         dm: video: Add damage tracking API
> >>>>>>>>>>>>>>         dm: video: Add damage notification on display fills
> >>>>>>>>>>>>>>         vidconsole: Add damage notifications to all
> >>>>>>>>>>>>>> vidconsole drivers
> >>>>>>>>>>>>>>         video: Add damage notification on bmp display
> >>>>>>>>>>>>>>         efi_loader: GOP: Add damage notification on BLT
> >>>>>>>>>>>>>>         video: Only dcache flush damaged lines
> >>>>>>>>>>>>>>         video: Use VIDEO_DAMAGE for VIDEO_COPY
> >>>>>>>>>>>>>>         video: Always compile cache flushing code
> >>>>>>>>>>>>>>         video: Enable VIDEO_DAMAGE for drivers that need it
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Alper Nebi Yasak (4):
> >>>>>>>>>>>>>>         video: test: Split copy frame buffer check into a
> >>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>         video: test: Support checking copy frame buffer
> >>>>>>>>>>>>>> contents
> >>>>>>>>>>>>>>         video: test: Test partial updates of hardware frame
> >>>>>>>>>>>>>> buffer
> >>>>>>>>>>>>>>         video: test: Test video damage tracking via 
> >>>>>>>>>>>>>> vidconsole
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        arch/arm/mach-omap2/omap3/Kconfig |   1 +
> >>>>>>>>>>>>>>        arch/arm/mach-sunxi/Kconfig |   1 +
> >>>>>>>>>>>>>>        drivers/video/Kconfig |  26 +++
> >>>>>>>>>>>>>>        drivers/video/console_normal.c |  27 ++--
> >>>>>>>>>>>>>>        drivers/video/console_rotate.c |  94 +++++++----
> >>>>>>>>>>>>>>        drivers/video/console_truetype.c |  37 +++--
> >>>>>>>>>>>>>>        drivers/video/exynos/Kconfig |   1 +
> >>>>>>>>>>>>>>        drivers/video/imx/Kconfig |   1 +
> >>>>>>>>>>>>>>        drivers/video/meson/Kconfig |   1 +
> >>>>>>>>>>>>>>        drivers/video/rockchip/Kconfig |   1 +
> >>>>>>>>>>>>>>        drivers/video/stm32/Kconfig |   1 +
> >>>>>>>>>>>>>>        drivers/video/tegra20/Kconfig |   1 +
> >>>>>>>>>>>>>>        drivers/video/tidss/Kconfig |   1 +
> >>>>>>>>>>>>>>        drivers/video/vidconsole-uclass.c |  16 --
> >>>>>>>>>>>>>>        drivers/video/video-uclass.c | 190
> >>>>>>>>>>>>>> ++++++++++++----------
> >>>>>>>>>>>>>>        drivers/video/video_bmp.c |   7 +-
> >>>>>>>>>>>>>>        include/video.h |  59 +++----
> >>>>>>>>>>>>>>        include/video_console.h |  52 ------
> >>>>>>>>>>>>>>        lib/efi_loader/efi_gop.c |   7 +
> >>>>>>>>>>>>>>        test/dm/video.c | 256
> >>>>>>>>>>>>>> ++++++++++++++++++++++++------
> >>>>>>>>>>>>>>        20 files changed, 483 insertions(+), 297 deletions(-)
> >>>>>>>>>>>>> It is good to see this tidied up into something that can be
> >>>>>>>>>>>>> applied!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am unsure what is going on with the EFI performance,
> >>>>>>>>>>>>> though. It
> >>>>>>>>>>>>> should not flush the cache after every character, only after
> >>>>>>>>>>>>> a new
> >>>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix
> >>>>>>>>>>>>> that bug
> >>>>>>>>>>>>> first and it should be patch 1 of this series.
> >>>>>>>>>>>> Before I came up with this series, I was trying to identify
> >>>>>>>>>>>> the UEFI bug
> >>>>>>>>>>>> in question as well, because intuition told me surely this is
> >>>>>>>>>>>> a bug in
> >>>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
> >>>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
> >>>>>>>>>>> implementation. Where did you look for the bug?
> >>>>>>>>>> The "real" bug is in grub. But given that it's reasonably
> >>>>>>>>>> simple to work
> >>>>>>>>>> around in U-Boot and even with it "fixed" in grub we would
> >>>>>>>>>> still see
> >>>>>>>>>> performance benefits from flushing only parts of the screen, I
> >>>>>>>>>> think
> >>>>>>>>>> it's worth living with the grub deficiency.
> >>>>>>>>> OK thanks for digging into it. I suggest we add a param to
> >>>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the
> >>>>>>>>> EFI code
> >>>>>>>>> can indicate this and try to be a bit smarter about it.
> >>>>>>>> It doesn't know when to sync either. From its point of view, any
> >>>>>>>> "console output" could be the last one. There is no API in UEFI 
> >>>>>>>> that
> >>>>>>>> says "please flush console output now".
> >>>>>>> Yes, I understand. I was not suggesting we were missing an API. But
> >>>>>>> some sort of heuristic would do, e.g. only flush on a newline, 
> >>>>>>> flush
> >>>>>>> every 50 chars, etc.
> >>>>>> I can't think of any heuristic that would reliably work. Relevant 
> >>>>>> for
> >>>>>> this conversation, UEFI provides 2 calls:
> >>>>>>
> >>>>>>      * Write string to screen (efi_cout_output_string)
> >>>>>>      * Set text cursor position to X, Y 
> >>>>>> (efi_cout_set_cursor_position)
> >>>>>>
> >>>>>> It's perfectly legal for a UEFI application to do something like
> >>>>>>
> >>>>>> efi_cout_set_cursor_position(10, 10);
> >>>>>> efi_cout_output_string("f");
> >>>>>> efi_cout_output_string("o");
> >>>>>> efi_cout_output_string("o") ;
> >>>>>>
> >>>>>> to update contents of a virtual text box on the screen. Where in 
> >>>>>> this
> >>>>>> chain of events would we call video_sync(), but on every call to
> >>>>>> efi_cout_output_string()?
> >>>>> Actually U-Boot has the same problem, but we have managed to work
> >>>>> out something.
> >>>>
> >>>> U-Boot as a code base has a much easier stance: It can add APIs 
> >>>> when it
> >>>> needs them in places that require them. With UEFI (as well as the 
> >>>> U-Boot
> >>>> native API), we're stuck with what's there.
> >>>>
> >>>> I also don't understand what you mean by "we have managed to work out
> >>>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
> >>>> behavior and speeds up non-UEFI boots as well. The improvement 
> >>>> there is
> >>>> just not as impressive as with grub :).
> >>> We are still not quite on the same page...
> >>>
> >>> U-Boot does have video_sync() but it doesn't know when to call it. If
> >>> it does not call it, then any amount of single-threaded code can run
> >>> after that, which may update the framebuffer. In other words, U-Boot
> >>> is in exactly the same boat as UEFI. It has to decide whether to call
> >>> video_sync() based on some sort of heuristic.
> >>>
> >>> That is the only point I am trying to make here. Does that make sense?
> >>
> >>
> >> Oh, I thought you mentioned above that U-Boot is in a better spot or
> >> "has it solved already". I agree - it's in the same boat and the only
> >> safe thing it can really do today that is fully cross-platform
> >> compatible is to call video_sync() after every character.
> >>
> >> I don't understand what you mean by "any amount of single-threaded code
> >> can run after that, which may update the framebuffer". Any framebuffer
> >> modification is U-Boot internal code which then again can apply
> >> video_sync() to tell the system "I want what I wrote to screen actually
> >> be on screen now". I don't think that's necessarily bad design. A bit
> >> clunky, but we're in a pre-boot environment after all.
> >>
> >> Since we're aligned now: What exactly did you refer to with "but we have
> >> managed to work out something"?
> >
> > Should we set PixelBltOnly to indicate to UEFI applications that they
> > are not allowed to directly write to the framebuffer but always have to
> > use BitBlt? GRUB seems to be using a shadow buffer by default which it
> > copies via BitBlt.
> 
> 
> If we do that, OSs will no longer be able to carry the frame buffer 
> address over and continue to use it with to draw on the screen natively 
> (like Linux's efifb).
> 
> So no, I don't think we should indicate PixelBltOnly. The frame buffer 
> is usually available to applications, you just need to adhere to the 
> architecture's caching constraints.

Right.  That would probably kill any reasonable way we can have an
early framebuffer console in most OSes after ExitBootServices() has
been called.

I'm late to the game but isn't the real solution to have U-Boot map
the framebuffer in a cache-coherent way?  This is what typically
happens on x86 where VRAM is mapped as "write-combining".  That is,
uncached but going through the store buffer to speed up writes.
Reading from the framebuffer will be slow in that case (which probably
is the real reason why grub uses a shadow framebuffer).  So U-Boot
still needs to some cleverness to make sure it only ever writes to the
framebuffer.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-29  6:27                           ` Alexander Graf
@ 2023-08-30 18:27                             ` Alper Nebi Yasak
  2023-08-30 19:52                               ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-30 18:27 UTC (permalink / raw)
  To: Alexander Graf, Simon Glass
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Philipp Tomsich, Andrew Davis, Da Xue, Heinrich Schuchardt,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong

[-- Attachment #1: Type: text/plain, Size: 6030 bytes --]

Hi all,

On 2023-08-29 09:27 +03:00, Alexander Graf wrote:
> On 29.08.23 00:08, Simon Glass wrote:
>> On Mon, 28 Aug 2023 at 14:24, Alexander Graf <agraf@csgraf.de> wrote:
>>> On 28.08.23 19:54, Simon Glass wrote:
>>>> U-Boot does have video_sync() but it doesn't know when to call it. If
>>>> it does not call it, then any amount of single-threaded code can run
>>>> after that, which may update the framebuffer. In other words, U-Boot
>>>> is in exactly the same boat as UEFI. It has to decide whether to call
>>>> video_sync() based on some sort of heuristic.
>>>>
>>>> That is the only point I am trying to make here. Does that make sense?
>>>
>>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>>> "has it solved already". I agree - it's in the same boat and the only
>>> safe thing it can really do today that is fully cross-platform
>>> compatible is to call video_sync() after every character.
>>>
>>> I don't understand what you mean by "any amount of single-threaded code
>>> can run after that, which may update the framebuffer". Any framebuffer
>>> modification is U-Boot internal code which then again can apply
>>> video_sync() to tell the system "I want what I wrote to screen actually
>>> be on screen now". I don't think that's necessarily bad design. A bit
>>> clunky, but we're in a pre-boot environment after all.
>>>
>>> Since we're aligned now: What exactly did you refer to with "but we have
>>> managed to work out something"?
>>>> So now we are on the same page about that point. The next step is my
>> heuristic point:
>>
>> vidconsole_putc_xy() - does not call video_sync()
>> vidconsole_newline() - does
>>
>> I am simply suggesting that UEFI should do the same thing.
> 
> 
> I think the better analogy is with
> 
> vidconsole_puts() - does
> 
> and that's exactly the function that the UEFI code calls. The UEFI 
> interface is "write this long string to screen". All the UEFI code does 
> is call vidconsole_puts() to do that which then (rightfully) calls 
> video_sync().
> 
> The reason we flush after every character with grub is grub: Grub abuses 
> the "write long string to screen" function and instead only writes a 
> single character on each call, which then means we flush on every 
> character write.

There's another reason I discovered empirically as I tried to implement
cyclic video_sync()s instead of calling it whenever. The writes will go
through eventually (as the cache is filled by other work?) even if *we*
don't explicitly flush it. That means partial data gets pushed to the
display in an uncontrolled way, resulting in bad graphical artifacts.

And I mean very noticeable things like a blocky waterfall effect when
filling a blue rectangle background in GRUB menu, or half-rendered
letters in U-Boot console (very obvious if I get it to panic-and-hang).

I think that locks it down, and there's two reasonable things we can do:

- Write and immediately flush to fb (hardware) every time
- Batch writes in fb, periodically write-and-flush to copy_fb (hardware)

Both can utilize a damage tracking feature to minimize the amount of
copy/flush done. And maybe we can implement the two simultaneously if we
skip flushing when damaged region is zero-sized already-flushed.

There's a flaw with the second approach though, EFI can have direct
access to the hardware copy_fb. When it has directly written to the
framebuffer, we would need to at least stop overwriting that, and
ideally copy backwards to the non-hardware fb. Is there some kind of a
locking API that EFI apps use to get/release the framebuffer? We could
hook that into something like that.

Note that I've been using "flush" and not "sync" to avoid talking about
when a driver ops->video_sync() should be called. Is that fundamentally
incompatible with EFI, can we even call that after e.g. ExitBootServices
where the OS wants to use it as efifb? Maybe we should always call that
periodically at 60Hz (16666us) or so?

(I'm testing on rk3399-gru-kevin with a 2400x1600 eDP screen by the way,
and attaching some WIP patches if you want to test. Debian arm64 netinst
iso uses text-mode GRUB by default, in case you need a payload to test.)

>>>>>> Also I notice that EFI calls notify? all the time, so U-Boot probably
>>>>>> does have the ability to sync the video every 10ms if we wanted to.

BTW, with attached cyclic patch on chromebook_kevin, I immediately get a
warning that it takes too long, at 2-8ms without/with VIDEO_COPY. It's
about 11ms for both on sandbox on my PC.

>>>>> I fail to see how invalidating the frame buffer for the screen every
>>>>> 10ms is any better than doing flush+invalidate operations only on screen
>>>>> areas that changed? It's more fragile, more difficult to understand and
>>>>> also significantly more expensive given that most of the time very
>>>>> little on the screen actually changes.
>>>>>>>> I am not suggesting it is better, at all. I am just trying to explain
>>>> that the U-Boot EFI implementation should not be calling video_sync()
>>>> after every character, before or after this series.
>>>
>>> Ah, it doesn't :). It just calls the normal U-Boot "write character on
>>> screen" function. What that does is up to U-Boot internals - and those
>>> basically today dictate that something needs to call video_sync() to
>>> make FB modifications actually pop up on screen.
>>>
>> Hmmm, so what function is it calling, then?  I think we are getting
>> closer to the 'fix' I am trying to tease out.
> 
> It literally calls vidconsole_puts():
> 
> https://github.com/u-boot/u-boot/blob/master/lib/efi_loader/efi_console.c#L185

I'd think "sync after a whole string is printed" is an OK heuristic, and
GRUB is abusing it... But maybe GRUB is doing these things as an ad-hoc
double buffering implementation with forced syncs at the expense of
performance to avoid buggy firmware causing graphical artifacts.

In any case, apologies but I'll be unable to work on U-Boot things until
late September, may also be unable to respond. (Going to DebConf soon)

[-- Attachment #2: 0001-video-Replace-all-video_sync-calls-with-a-cyclic.patch --]
[-- Type: text/x-patch, Size: 7338 bytes --]

From 5144e9479c6b31cac33e98b5ae00a6d103f19462 Mon Sep 17 00:00:00 2001
From: Alper Nebi Yasak <alpernebiyasak@gmail.com>
Date: Tue, 22 Aug 2023 18:05:29 +0300
Subject: [PATCH] WIP: video: Replace all video_sync calls with a cyclic
 function

---
 boot/expo.c                       |  2 --
 cmd/video.c                       |  2 --
 drivers/serial/sandbox.c          |  2 --
 drivers/video/Kconfig             |  1 +
 drivers/video/vidconsole-uclass.c | 35 +++----------------------------
 drivers/video/video-uclass.c      | 28 ++++++++++++++++++++++++-
 drivers/video/video_bmp.c         |  2 +-
 include/video.h                   |  2 ++
 lib/efi_loader/efi_gop.c          |  2 --
 9 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/boot/expo.c b/boot/expo.c
index 139d684f8e6e..03e858c03c80 100644
--- a/boot/expo.c
+++ b/boot/expo.c
@@ -208,8 +208,6 @@ int expo_render(struct expo *exp)
 			return log_msg_ret("ren", ret);
 	}
 
-	video_sync(dev, true);
-
 	return scn ? 0 : -ECHILD;
 }
 
diff --git a/cmd/video.c b/cmd/video.c
index 942f81c16336..98de2c9f1b8d 100644
--- a/cmd/video.c
+++ b/cmd/video.c
@@ -42,8 +42,6 @@ static int do_video_puts(struct cmd_tbl *cmdtp, int flag, int argc,
 	if (uclass_first_device_err(UCLASS_VIDEO_CONSOLE, &dev))
 		return CMD_RET_FAILURE;
 	ret = vidconsole_put_string(dev, argv[1]);
-	if (!ret)
-		ret = video_sync(dev->parent, false);
 
 	return ret ? CMD_RET_FAILURE : 0;
 }
diff --git a/drivers/serial/sandbox.c b/drivers/serial/sandbox.c
index f4003811ee75..586e1154a327 100644
--- a/drivers/serial/sandbox.c
+++ b/drivers/serial/sandbox.c
@@ -139,8 +139,6 @@ static int sandbox_serial_pending(struct udevice *dev, bool input)
 		return 0;
 
 	os_usleep(100);
-	if (IS_ENABLED(CONFIG_VIDEO) && !IS_ENABLED(CONFIG_SPL_BUILD))
-		video_sync_all();
 	avail = membuff_putraw(&priv->buf, 100, false, &data);
 	if (!avail)
 		return 1;	/* buffer full */
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index 09f2cb1a7321..530485102c78 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -7,6 +7,7 @@ menu "Graphics support"
 config VIDEO
 	bool "Enable driver model support for LCD/video"
 	depends on DM
+	select CYCLIC
 	help
 	  This enables driver model for LCD and video devices. These support
 	  a bitmap display of various sizes and depths which can be drawn on
diff --git a/drivers/video/vidconsole-uclass.c b/drivers/video/vidconsole-uclass.c
index b5b3b6625902..876e80f9ebe5 100644
--- a/drivers/video/vidconsole-uclass.c
+++ b/drivers/video/vidconsole-uclass.c
@@ -77,7 +77,7 @@ static int vidconsole_back(struct udevice *dev)
 		if (priv->ycur < 0)
 			priv->ycur = 0;
 	}
-	return video_sync(dev->parent, false);
+	return 0;
 }
 
 /* Move to a newline, scrolling the display if necessary */
@@ -87,7 +87,7 @@ static void vidconsole_newline(struct udevice *dev)
 	struct udevice *vid_dev = dev->parent;
 	struct video_priv *vid_priv = dev_get_uclass_priv(vid_dev);
 	const int rows = CONFIG_VAL(CONSOLE_SCROLL_LINES);
-	int i, ret;
+	int i;
 
 	priv->xcur_frac = priv->xstart_frac;
 	priv->ycur += priv->y_charsize;
@@ -101,13 +101,6 @@ static void vidconsole_newline(struct udevice *dev)
 		priv->ycur -= rows * priv->y_charsize;
 	}
 	priv->last_ch = 0;
-
-	ret = video_sync(dev->parent, false);
-	if (ret) {
-#ifdef DEBUG
-		console_puts_select_stderr(true, "[vc err: video_sync]");
-#endif
-	}
 }
 
 static char *parsenum(char *s, int *num)
@@ -298,15 +291,7 @@ static void vidconsole_escape_char(struct udevice *dev, char ch)
 		parsenum(priv->escape_buf + 1, &mode);
 
 		if (mode == 2) {
-			int ret;
-
 			video_clear(dev->parent);
-			ret = video_sync(dev->parent, false);
-			if (ret) {
-#ifdef DEBUG
-				console_puts_select_stderr(true, "[vc err: video_sync]");
-#endif
-			}
 			priv->ycur = 0;
 			priv->xcur_frac = priv->xstart_frac;
 		} else {
@@ -520,12 +505,6 @@ static void vidconsole_putc(struct stdio_dev *sdev, const char ch)
 	if (ret) {
 #ifdef DEBUG
 		console_puts_select_stderr(true, "[vc err: putc]");
-#endif
-	}
-	ret = video_sync(dev->parent, false);
-	if (ret) {
-#ifdef DEBUG
-		console_puts_select_stderr(true, "[vc err: video_sync]");
 #endif
 	}
 }
@@ -542,12 +521,6 @@ static void vidconsole_puts(struct stdio_dev *sdev, const char *s)
 
 		snprintf(str, sizeof(str), "[vc err: puts %d]", ret);
 		console_puts_select_stderr(true, str);
-#endif
-	}
-	ret = video_sync(dev->parent, false);
-	if (ret) {
-#ifdef DEBUG
-		console_puts_select_stderr(true, "[vc err: video_sync]");
 #endif
 	}
 }
@@ -685,9 +658,7 @@ UCLASS_DRIVER(vidconsole) = {
 #ifdef CONFIG_VIDEO_COPY
 int vidconsole_sync_copy(struct udevice *dev, void *from, void *to)
 {
-	struct udevice *vid = dev_get_parent(dev);
-
-	return video_sync_copy(vid, from, to);
+	return 0;
 }
 
 int vidconsole_memmove(struct udevice *dev, void *dst, const void *src,
diff --git a/drivers/video/video-uclass.c b/drivers/video/video-uclass.c
index f743ed74c818..099ba5acaa9b 100644
--- a/drivers/video/video-uclass.c
+++ b/drivers/video/video-uclass.c
@@ -6,6 +6,7 @@
 #define LOG_CATEGORY UCLASS_VIDEO
 
 #include <common.h>
+#include <cyclic.h>
 #include <bloblist.h>
 #include <console.h>
 #include <cpu_func.h>
@@ -249,7 +250,7 @@ int video_fill(struct udevice *dev, u32 colour)
 	if (ret)
 		return ret;
 
-	return video_sync(dev, false);
+	return 0;
 }
 
 int video_clear(struct udevice *dev)
@@ -351,6 +352,24 @@ void video_set_default_colors(struct udevice *dev, bool invert)
 	priv->colour_bg = video_index_to_colour(priv, back);
 }
 
+static void video_cyclic(void *ctx)
+{
+	struct udevice *dev = ctx;
+	int ret;
+
+	if (!device_active(dev))
+		return;
+
+	video_sync_copy_all(dev);
+
+	ret = video_sync(dev, true);
+	if (ret) {
+#ifdef DEBUG
+		console_puts_select_stderr(true, "[vc err: video_sync]");
+#endif
+	}
+}
+
 /* Flush video activity to the caches */
 int video_sync(struct udevice *vid, bool force)
 {
@@ -603,6 +622,13 @@ static int video_post_probe(struct udevice *dev)
 		}
 	}
 
+	/* Register video sync as a cyclic function at 60Hz */
+	priv->cyclic = cyclic_register(video_cyclic, 16666, dev->name, dev);
+	if (!priv->cyclic) {
+		log_err("cyclic_register for %s video sync failed\n", dev->name);
+		return -ENODEV;
+	}
+
 	return 0;
 };
 
diff --git a/drivers/video/video_bmp.c b/drivers/video/video_bmp.c
index 45f003c8251a..90a0e4e98b11 100644
--- a/drivers/video/video_bmp.c
+++ b/drivers/video/video_bmp.c
@@ -466,5 +466,5 @@ int video_bmp_display(struct udevice *dev, ulong bmp_image, int x, int y,
 	if (ret)
 		return log_ret(ret);
 
-	return video_sync(dev, false);
+	return 0;
 }
diff --git a/include/video.h b/include/video.h
index 66d109ca5da6..31aa8df171fa 100644
--- a/include/video.h
+++ b/include/video.h
@@ -8,6 +8,7 @@
 #define _VIDEO_H_
 
 #include <stdio_dev.h>
+#include <cyclic.h>
 
 struct udevice;
 
@@ -118,6 +119,7 @@ struct video_priv {
 	bool flush_dcache;
 	u8 fg_col_idx;
 	u8 bg_col_idx;
+	struct cyclic_info *cyclic;
 };
 
 /**
diff --git a/lib/efi_loader/efi_gop.c b/lib/efi_loader/efi_gop.c
index 778b693f983a..56ca0482d871 100644
--- a/lib/efi_loader/efi_gop.c
+++ b/lib/efi_loader/efi_gop.c
@@ -451,8 +451,6 @@ static efi_status_t EFIAPI gop_blt(struct efi_gop *this,
 	if (ret != EFI_SUCCESS)
 		return EFI_EXIT(ret);
 
-	video_sync_all();
-
 	return EFI_EXIT(EFI_SUCCESS);
 }
 
-- 
2.40.1


[-- Attachment #3: 0002-video-rockchip-Implement-copy-framebuffer-suppor.patch --]
[-- Type: text/x-patch, Size: 2107 bytes --]

From 87771cf765e6cb1096d6487e4dcc3c6972df0c8e Mon Sep 17 00:00:00 2001
From: Alper Nebi Yasak <alpernebiyasak@gmail.com>
Date: Fri, 25 Aug 2023 21:29:52 +0300
Subject: [PATCH] WIP: video: rockchip: Implement copy framebuffer support

---
 configs/chromebook_kevin_defconfig |  3 +--
 drivers/video/rockchip/rk_vop.c    | 11 ++++++++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/configs/chromebook_kevin_defconfig b/configs/chromebook_kevin_defconfig
index 6ce69f419e17..0baa73cc62db 100644
--- a/configs/chromebook_kevin_defconfig
+++ b/configs/chromebook_kevin_defconfig
@@ -5,8 +5,6 @@ CONFIG_ARCH_ROCKCHIP=y
 CONFIG_TEXT_BASE=0x00200000
 CONFIG_SPL_GPIO=y
 CONFIG_NR_DRAM_BANKS=1
-CONFIG_HAS_CUSTOM_SYS_INIT_SP_ADDR=y
-CONFIG_CUSTOM_SYS_INIT_SP_ADDR=0x300000
 CONFIG_SF_DEFAULT_SPEED=20000000
 CONFIG_ENV_SIZE=0x8000
 CONFIG_DEFAULT_DEVICE_TREE="rk3399-gru-kevin"
@@ -129,6 +127,7 @@ CONFIG_USB_ETHER_SMSC95XX=y
 CONFIG_VIDEO=y
 # CONFIG_VIDEO_FONT_8X16 is not set
 CONFIG_VIDEO_FONT_16X32=y
+CONFIG_VIDEO_COPY=y
 CONFIG_DISPLAY=y
 CONFIG_VIDEO_ROCKCHIP=y
 CONFIG_VIDEO_ROCKCHIP_MAX_XRES=2400
diff --git a/drivers/video/rockchip/rk_vop.c b/drivers/video/rockchip/rk_vop.c
index c514e2a0e449..e70689851a61 100644
--- a/drivers/video/rockchip/rk_vop.c
+++ b/drivers/video/rockchip/rk_vop.c
@@ -464,10 +464,16 @@ int rk_vop_probe(struct udevice *dev)
 		return -EINVAL;
 	}
 
+	if (IS_ENABLED(CONFIG_VIDEO_COPY))
+		plat->copy_base = plat->base + plat->size / 2;
+
 	for (node = ofnode_first_subnode(port);
 	     ofnode_valid(node);
 	     node = dev_read_next_subnode(node)) {
-		ret = rk_display_init(dev, plat->base, node);
+		if (IS_ENABLED(CONFIG_VIDEO_COPY))
+			ret = rk_display_init(dev, plat->copy_base, node);
+		else
+			ret = rk_display_init(dev, plat->base, node);
 		if (ret)
 			debug("Device failed: ret=%d\n", ret);
 		if (!ret)
@@ -485,5 +491,8 @@ int rk_vop_bind(struct udevice *dev)
 	plat->size = 4 * (CONFIG_VIDEO_ROCKCHIP_MAX_XRES *
 			  CONFIG_VIDEO_ROCKCHIP_MAX_YRES);
 
+	if (IS_ENABLED(CONFIG_VIDEO_COPY))
+		plat->size *= 2;
+
 	return 0;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY
  2023-08-21 20:06     ` Alexander Graf
@ 2023-08-30 19:07       ` Alper Nebi Yasak
  2023-08-31  2:49         ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-30 19:07 UTC (permalink / raw)
  To: Alexander Graf, Simon Glass
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Philipp Tomsich, Andrew Davis, Da Xue, Heinrich Schuchardt,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong

On 2023-08-21 23:06 +03:00, Alexander Graf wrote:
> 
> On 21.08.23 21:11, Simon Glass wrote:
>> Hi Alper,
>>
>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>> From: Alexander Graf <agraf@csgraf.de>
>>>
>>> CONFIG_VIDEO_COPY implemented a range-based copying mechanism: If we
>>> print a single character, it will always copy the full range of bytes
>>> from the top left corner of the character to the lower right onto the
>>> uncached frame buffer. This includes pretty much the full line contents
>>> of the printed character.
>>>
>>> Since we now have proper damage tracking, let's make use of that to reduce
>>> the amount of data we need to copy. With this patch applied, we will only
>>> copy the tiny rectangle surrounding characters when we print them,
>>> speeding up the video console.
>> I suppose for rotated consoles it copies whole lines, but otherwise it
>> does a lot of small copies?
> 
> 
> I tried to keep the code as simple as possible and only track an "upper 
> left" and "lower right" corner of modifications. So sync will always 
> copy/flush a single rectangle.

Yep, see patch 06/13 for size of the regions. E.g. for putc_xy()s it's
fontdata->height * fontdata->width, for rows it's like fontdata->height
* vid_priv->xsize * count...

>>
>>> After this, changes to the main frame buffer are not immediately copied
>>> to the copy frame buffer, but postponed until the next video device
>>> sync. So issue an explicit sync before inspecting the copy frame buffer
>>> contents for the video tests.
>> So how does the sync get done in this case?
> 
> It gets called as part of video_sync():
> 
> +static void video_flush_copy(struct udevice *vid)
> +{
> +	struct video_priv *priv = dev_get_uclass_priv(vid);
> +
> +	if (!priv->copy_fb)
> +		return;
> +
> +	if (priv->damage.xend && priv->damage.yend) {
> +		int lstart = priv->damage.xstart * VNBYTES(priv->bpix);
> +		int lend = priv->damage.xend * VNBYTES(priv->bpix);
> +		int y;
> +
> +		for (y = priv->damage.ystart; y < priv->damage.yend; y++) {
> +			ulong offset = (y * priv->line_length) + lstart;
> +			ulong len = lend - lstart;
> +
> +			memcpy(priv->copy_fb + offset, priv->fb + offset, len);
> +		}
> +	}
> +}

I think Simon was asking how and when video_sync() is called outside the
tests. The tests use lower-level functions that are ops->putc_xy() in
each console, and normally vidconsole calls higher on the call-chain
also maybe do a video_sync() when they think it's worth updating the
display.

>>
>>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>>> [Alper: Rebase for fontdata->height/w, fill_part(), fix memmove(dev),
>>>          drop from defconfig, use damage.xstart/yend, use IS_ENABLED(),
>>>          call video_sync() before copy_fb check, update video_copy test]
>>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>>> ---
>>>
>>> Changes in v5:
>>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
>>> - Fix memmove() calls by removing the extra dev argument
>>> - Call video_sync() before checking copy_fb in video tests
>>> - Use xstart, ystart, xend, yend as names for damage region
>>> - Use met->baseline instead of priv->baseline
>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
>>> - Use xstart, ystart, xend, yend as names for damage region
>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>> - Drop VIDEO_DAMAGE from sandbox defconfig added in a new patch
>>> - Update dm_test_video_copy test added in a new patch
>>>
>>> Changes in v3:
>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>
>>> Changes in v2:
>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>
>>>   configs/sandbox_defconfig         |  1 -
>>>   drivers/video/Kconfig             |  5 ++
>>>   drivers/video/console_normal.c    | 13 +----
>>>   drivers/video/console_rotate.c    | 44 +++-----------
>>>   drivers/video/console_truetype.c  | 16 +----
>>>   drivers/video/vidconsole-uclass.c | 16 -----
>>>   drivers/video/video-uclass.c      | 97 ++++++++-----------------------
>>>   drivers/video/video_bmp.c         |  7 ---
>>>   include/video.h                   | 37 ------------
>>>   include/video_console.h           | 52 -----------------
>>>   test/dm/video.c                   |  3 +-
>>>   11 files changed, 43 insertions(+), 248 deletions(-)
>>>
>>> diff --git a/configs/sandbox_defconfig b/configs/sandbox_defconfig
>>> index 51b820f13121..259f31f26cee 100644
>>> --- a/configs/sandbox_defconfig
>>> +++ b/configs/sandbox_defconfig
>>> @@ -307,7 +307,6 @@ CONFIG_USB_ETH_CDC=y
>>>   CONFIG_VIDEO=y
>>>   CONFIG_VIDEO_FONT_SUN12X22=y
>>>   CONFIG_VIDEO_COPY=y
>>> -CONFIG_VIDEO_DAMAGE=y
>>>   CONFIG_CONSOLE_ROTATION=y
>>>   CONFIG_CONSOLE_TRUETYPE=y
>>>   CONFIG_CONSOLE_TRUETYPE_CANTORAONE=y
>>> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
>>> index 97f494a1340b..b3fbd9d7d9ca 100644
>>> --- a/drivers/video/Kconfig
>>> +++ b/drivers/video/Kconfig
>>> @@ -83,11 +83,14 @@ config VIDEO_PCI_DEFAULT_FB_SIZE
>>>
>>>   config VIDEO_COPY
>>>          bool "Enable copying the frame buffer to a hardware copy"
>>> +       select VIDEO_DAMAGE
>>>          help
>>>            On some machines (e.g. x86), reading from the frame buffer is very
>>>            slow because it is uncached. To improve performance, this feature
>>>            allows the frame buffer to be kept in cached memory (allocated by
>>>            U-Boot) and then copied to the hardware frame-buffer as needed.
>>> +         It uses the VIDEO_DAMAGE feature to keep track of regions to copy
>>> +         and will only copy actually touched regions.
>>>
>>>            To use this, your video driver must set @copy_base in
>>>            struct video_uc_plat.
>>> @@ -105,6 +108,8 @@ config VIDEO_DAMAGE
>>>            regions of the frame buffer that were modified before, speeding up
>>>            screen refreshes significantly.
>>>
>>> +         It is also used by VIDEO_COPY to identify which regions changed.
>>> +
>>>   config BACKLIGHT_PWM
>>>          bool "Generic PWM based Backlight Driver"
>>>          depends on BACKLIGHT && DM_PWM
>>> diff --git a/drivers/video/console_normal.c b/drivers/video/console_normal.c
>>> index a19ce6a2bc11..c44aa09473a3 100644
>>> --- a/drivers/video/console_normal.c
>>> +++ b/drivers/video/console_normal.c
>>> @@ -35,10 +35,6 @@ static int console_set_row(struct udevice *dev, uint row, int clr)
>>>                  fill_pixel_and_goto_next(&dst, clr, pbytes, pbytes);
>>>          end = dst;
>>>
>>> -       ret = vidconsole_sync_copy(dev, line, end);
>>> -       if (ret)
>>> -               return ret;
>>> -
>>>          video_damage(dev->parent,
>>>                       0,
>>>                       fontdata->height * row,
>>> @@ -57,14 +53,11 @@ static int console_move_rows(struct udevice *dev, uint rowdst,
>>>          void *dst;
>>>          void *src;
>>>          int size;
>>> -       int ret;
>>>
>>>          dst = vid_priv->fb + rowdst * fontdata->height * vid_priv->line_length;
>>>          src = vid_priv->fb + rowsrc * fontdata->height * vid_priv->line_length;
>>>          size = fontdata->height * vid_priv->line_length * count;
>>> -       ret = vidconsole_memmove(dev, dst, src, size);
>>> -       if (ret)
>>> -               return ret;
>>> +       memmove(dst, src, size);
>> Why are you making that change?
> 
> There is no point in keeping a special vidconsole_memmove() around 
> anymore, since we don't actually need to call vidconsole_sync_copy() 
> after the move. The damage call that we introduced to all call sites in 
> combination with a video_sync() call takes over the job of the sync copy.

More specifically, this batches the copying work video_sync_copy() does
per console-op into video_flush_copy() called once per video_sync().
Then, since vidconsole_memmove() is only used to memmove() and invoke
that copy mechanism, we can also reduce it to just memmove().

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 10/13] video: Only dcache flush damaged lines
  2023-08-21 23:03           ` Simon Glass
@ 2023-08-30 19:12             ` Alper Nebi Yasak
  2023-08-30 19:57               ` Alexander Graf
  0 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-30 19:12 UTC (permalink / raw)
  To: Simon Glass, Alexander Graf
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Philipp Tomsich, Andrew Davis, Da Xue, Heinrich Schuchardt,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong

On 2023-08-22 02:03 +03:00, Simon Glass wrote:
> Hi Alex,
> 
> On Mon, 21 Aug 2023 at 16:44, Alexander Graf <agraf@csgraf.de> wrote:
>>
>>
>> On 22.08.23 00:10, Simon Glass wrote:
>>> Hi Alex,
>>>
>>> On Mon, 21 Aug 2023 at 13:59, Alexander Graf <agraf@csgraf.de> wrote:
>>>>
>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>> Hi Alper,
>>>>>
>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>>>> From: Alexander Graf <agraf@csgraf.de>
>>>>>>
>>>>>> Now that we have a damage area tells us which parts of the frame buffer
>>>>>> actually need updating, let's only dcache flush those on video_sync()
>>>>>> calls. With this optimization in place, frame buffer updates - especially
>>>>>> on large screen such as 4k displays - speed up significantly.
>>>>>>
>>>>>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>>>>>> Reported-by: Da Xue <da@libre.computer>
>>>>>> [Alper: Use damage.xstart/yend, IS_ENABLED()]
>>>>>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>>>>>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>>>>>> ---
>>>>>>
>>>>>> Changes in v5:
>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>
>>>>>> Changes in v2:
>>>>>> - Fix dcache range; we were flushing too much before
>>>>>> - Remove ifdefs
>>>>>>
>>>>>>    drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
>>>>>>    1 file changed, 36 insertions(+), 5 deletions(-)
>>>>> This is a little strange, since flushing the whole cache will only
>>>>> actually write out data that was actually written (to the display). Is
>>>>> there a benefit to this patch, in terms of performance?
>>>>
>>>> I'm happy to see you go through the same thought process I went through
>>>> when writing these: "This surely can't be the problem, can it?". The
>>>> answer is "simple" in hindsight:
>>>>
>>>> Have a look at the ARMv8 cache flush function. It does the only "safe"
>>>> thing you can expect it to do: Clean+Invalidate to POC because we use it
>>>> for multiple things, clearing modified code among others:
>>>>
>>>> ENTRY(__asm_flush_dcache_range)
>>>>           mrs     x3, ctr_el0
>>>>           ubfx    x3, x3, #16, #4
>>>>           mov     x2, #4
>>>>           lsl     x2, x2, x3              /* cache line size */
>>>>
>>>>           /* x2 <- minimal cache line size in cache system */
>>>>           sub     x3, x2, #1
>>>>           bic     x0, x0, x3
>>>> 1:      dc      civac, x0       /* clean & invalidate data or unified
>>>> cache */
>>>>           add     x0, x0, x2
>>>>           cmp     x0, x1
>>>>           b.lo    1b
>>>>           dsb     sy
>>>>           ret
>>>> ENDPROC(__asm_flush_dcache_range)
>>>>
>>>>
>>>> Looking at the "dc civac" call, we find this documentation page from
>>>> ARM:
>>>> https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC
>>>>
>>>> This says we're writing any dirtyness of this cache line up to the POC
>>>> and then invalidate (remove the cache line) also up to POC. That means
>>>> when you look at a typical SBC, this will either be L2 or system level
>>>> cache. Every read afterwards needs to go and pull it all the way back to
>>>> L1 to modify it (or not) on the next character write and then flush it
>>>> again.
>>>>
>>>> Even worse: Because of the invalidate, we may even evict it from caches
>>>> that the display controller uses to read the frame buffer. So depending
>>>> on the SoC's cache topology and implementation, we may force the display
>>>> controller to refetch the full FB content on its next screen refresh cycle.
>>>>
>>>> I faintly remember that I tried to experiment with CVAC instead to only
>>>> flush without invalidating. I don't fully remember the results anymore
>>>> though. I believe CVAC just behaved identical to CIVAC on the A53
>>>> platform I was working on. And then I looked at Cortex-A53 errata like
>>>> [1] and just accepted that doing anything but restricting the flushing
>>>> range is a waste of time :)
>>> Yuck I didn't know it was invalidating too. That is horrible. Is there
>>> no way to fix it?
>>
>>
>> Before building all of this damage logic, I tried, but failed. I'd
>> welcome anyone else to try again :). I'm not even convinced yet that it
>> is actually fixable: Depending on the SoC's internal cache logic, it may
>> opt to always invalidate I think.
> 
> Wow, that is crazy! How is anyone supposed to make the system run well
> with logic like that??!
> 
>>
>> That said, this patch set really also makes sense outside of the
>> particular invalidate problem. It creates a generic abstraction between
>> the copy and non-copy code path and allows us to reduce the amount of
>> work spent for both, generically for any video sync operation.
> 
> Sure...my question was really why it helps so much, given what I
> understood the caches to be doing. If they are invalidating, then it
> is amazing anything gets done...

I don't really know cache mechanisms and terminology, but AFAIU there's
nothing actionable for this patch regarding this discussion, right?

Meanwhile I noticed this patch only flushes priv->fb, and think it also
needs to flush priv->copy_fb if VIDEO_COPY.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 04/13] dm: video: Add damage tracking API
  2023-08-21 19:11   ` Simon Glass
@ 2023-08-30 19:15     ` Alper Nebi Yasak
  2023-08-31  2:49       ` Simon Glass
  0 siblings, 1 reply; 56+ messages in thread
From: Alper Nebi Yasak @ 2023-08-30 19:15 UTC (permalink / raw)
  To: Simon Glass
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong



On 2023-08-21 22:11 +03:00, Simon Glass wrote:
> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>
>> From: Alexander Graf <agraf@csgraf.de>
>>
>> We are going to introduce image damage tracking to fasten up screen
>> refresh on large displays. This patch adds damage tracking for up to
>> one rectangle of the screen which is typically enough to hold blt or
>> text print updates. Callers into this API and a reduced dcache flush
>> code path will follow in later patches.
>>
>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>> Reported-by: Da Xue <da@libre.computer>
>> [Alper: Use xstart/yend, document new fields, return void from
>>         video_damage(), declare priv, drop headers, use IS_ENABLED()]
>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>> ---
>>
>> Changes in v5:
>> - Use xstart, ystart, xend, yend as names for damage region
>> - Document damage struct and fields in struct video_priv comment
>> - Return void from video_damage()
>> - Fix undeclared priv error in video_sync()
>> - Drop unused headers from video-uclass.c
>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>
>> Changes in v4:
>> - Move damage clear to patch "dm: video: Add damage tracking API"
>> - Simplify first damage logic
>> - Remove VIDEO_DAMAGE default for ARM
>>
>> Changes in v3:
>> - Adapt to always assume DM is used
>>
>> Changes in v2:
>> - Remove ifdefs
>>
>>  drivers/video/Kconfig        | 13 ++++++++++++
>>  drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++++---
>>  include/video.h              | 32 ++++++++++++++++++++++++++--
>>  3 files changed, 81 insertions(+), 5 deletions(-)
>>
> 
> Reviewed-by: Simon Glass <sjg@chromium.org>
> 
> But I suggest an empty static inline in the case where the feature is disabled?

You mean with something like #ifdef CONFIG_VIDEO_DAMAGE, right?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-30 18:27                             ` Alper Nebi Yasak
@ 2023-08-30 19:52                               ` Alexander Graf
  0 siblings, 0 replies; 56+ messages in thread
From: Alexander Graf @ 2023-08-30 19:52 UTC (permalink / raw)
  To: Alper Nebi Yasak, Simon Glass
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Philipp Tomsich, Andrew Davis, Da Xue, Heinrich Schuchardt,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong


On 30.08.23 20:27, Alper Nebi Yasak wrote:
> Hi all,
>
> On 2023-08-29 09:27 +03:00, Alexander Graf wrote:
>> On 29.08.23 00:08, Simon Glass wrote:
>>> On Mon, 28 Aug 2023 at 14:24, Alexander Graf <agraf@csgraf.de> wrote:
>>>> On 28.08.23 19:54, Simon Glass wrote:
>>>>> U-Boot does have video_sync() but it doesn't know when to call it. If
>>>>> it does not call it, then any amount of single-threaded code can run
>>>>> after that, which may update the framebuffer. In other words, U-Boot
>>>>> is in exactly the same boat as UEFI. It has to decide whether to call
>>>>> video_sync() based on some sort of heuristic.
>>>>>
>>>>> That is the only point I am trying to make here. Does that make sense?
>>>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>>>> "has it solved already". I agree - it's in the same boat and the only
>>>> safe thing it can really do today that is fully cross-platform
>>>> compatible is to call video_sync() after every character.
>>>>
>>>> I don't understand what you mean by "any amount of single-threaded code
>>>> can run after that, which may update the framebuffer". Any framebuffer
>>>> modification is U-Boot internal code which then again can apply
>>>> video_sync() to tell the system "I want what I wrote to screen actually
>>>> be on screen now". I don't think that's necessarily bad design. A bit
>>>> clunky, but we're in a pre-boot environment after all.
>>>>
>>>> Since we're aligned now: What exactly did you refer to with "but we have
>>>> managed to work out something"?
>>>>> So now we are on the same page about that point. The next step is my
>>> heuristic point:
>>>
>>> vidconsole_putc_xy() - does not call video_sync()
>>> vidconsole_newline() - does
>>>
>>> I am simply suggesting that UEFI should do the same thing.
>>
>> I think the better analogy is with
>>
>> vidconsole_puts() - does
>>
>> and that's exactly the function that the UEFI code calls. The UEFI
>> interface is "write this long string to screen". All the UEFI code does
>> is call vidconsole_puts() to do that which then (rightfully) calls
>> video_sync().
>>
>> The reason we flush after every character with grub is grub: Grub abuses
>> the "write long string to screen" function and instead only writes a
>> single character on each call, which then means we flush on every
>> character write.
> There's another reason I discovered empirically as I tried to implement
> cyclic video_sync()s instead of calling it whenever. The writes will go
> through eventually (as the cache is filled by other work?) even if *we*
> don't explicitly flush it. That means partial data gets pushed to the
> display in an uncontrolled way, resulting in bad graphical artifacts.
>
> And I mean very noticeable things like a blocky waterfall effect when
> filling a blue rectangle background in GRUB menu, or half-rendered
> letters in U-Boot console (very obvious if I get it to panic-and-hang).
>
> I think that locks it down, and there's two reasonable things we can do:
>
> - Write and immediately flush to fb (hardware) every time
> - Batch writes in fb, periodically write-and-flush to copy_fb (hardware)
>
> Both can utilize a damage tracking feature to minimize the amount of
> copy/flush done. And maybe we can implement the two simultaneously if we
> skip flushing when damaged region is zero-sized already-flushed.
>
> There's a flaw with the second approach though, EFI can have direct
> access to the hardware copy_fb. When it has directly written to the
> framebuffer, we would need to at least stop overwriting that, and
> ideally copy backwards to the non-hardware fb. Is there some kind of a
> locking API that EFI apps use to get/release the framebuffer? We could
> hook that into something like that.


Edk2 also has a shadow frame buffer that it uses for VGA because 
otherwise the NC read accesses from VRAM would be too expensive. I don't 
believe there's any UEFI locking mechanism, but clear understanding that 
if you want to access the frame buffer while anything else but you could 
potentially access it too, you better use the UEFI APIs instead of 
scribbling on it yourself.


> Note that I've been using "flush" and not "sync" to avoid talking about
> when a driver ops->video_sync() should be called. Is that fundamentally
> incompatible with EFI, can we even call that after e.g. ExitBootServices
> where the OS wants to use it as efifb? Maybe we should always call that
> periodically at 60Hz (16666us) or so?


If you actually need to do something actively frequently, then that 
won't work anymore after ExitBootServices. At that point, all "normal" 
U-Boot code is gone.


> (I'm testing on rk3399-gru-kevin with a 2400x1600 eDP screen by the way,
> and attaching some WIP patches if you want to test. Debian arm64 netinst
> iso uses text-mode GRUB by default, in case you need a payload to test.)
>
>>>>>>> Also I notice that EFI calls notify? all the time, so U-Boot probably
>>>>>>> does have the ability to sync the video every 10ms if we wanted to.
> BTW, with attached cyclic patch on chromebook_kevin, I immediately get a
> warning that it takes too long, at 2-8ms without/with VIDEO_COPY. It's
> about 11ms for both on sandbox on my PC.


I would expect explicit damage flushes like the patch set originally 
does to be a lot more performant than anything dumber, but timer based.


Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 00/13] Add video damage tracking
  2023-08-29  9:19                             ` Mark Kettenis
@ 2023-08-30 19:55                               ` Alexander Graf
  0 siblings, 0 replies; 56+ messages in thread
From: Alexander Graf @ 2023-08-30 19:55 UTC (permalink / raw)
  To: Mark Kettenis
  Cc: xypron.glpk, sjg, alpernebiyasak, u-boot, kever.yang, jagan,
	andre.przywara, clamor95, philipp.tomsich, afd, da,
	patrice.chotard, patrick.delaunay, woods.technical, agust,
	uboot-stm32, mbrugger, u-boot-amlogic, ilias.apalodimas,
	neil.armstrong


On 29.08.23 11:19, Mark Kettenis wrote:
>> Date: Tue, 29 Aug 2023 08:20:49 +0200
>> From: Alexander Graf <agraf@csgraf.de>
>>
>> On 28.08.23 23:54, Heinrich Schuchardt wrote:
>>> On 8/28/23 22:24, Alexander Graf wrote:
>>>> On 28.08.23 19:54, Simon Glass wrote:
>>>>> Hi Alex,
>>>>>
>>>>> On Wed, 23 Aug 2023 at 02:56, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>> Hey Simon,
>>>>>>
>>>>>> On 22.08.23 20:56, Simon Glass wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> On Tue, 22 Aug 2023 at 01:47, Alexander Graf <agraf@csgraf.de> wrote:
>>>>>>>> On 22.08.23 01:03, Simon Glass wrote:
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Aug 2023 at 16:40, Alexander Graf <agraf@csgraf.de>
>>>>>>>>> wrote:
>>>>>>>>>> On 22.08.23 00:10, Simon Glass wrote:
>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 21 Aug 2023 at 14:20, Alexander Graf <agraf@csgraf.de>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 21.08.23 21:57, Simon Glass wrote:
>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <agraf@csgraf.de>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>>>>>>>>>>> Hi Alper,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak
>>>>>>>>>>>>>>> <alpernebiyasak@gmail.com> wrote:
>>>>>>>>>>>>>>>> This is a rebase of Alexander Graf's video damage tracking
>>>>>>>>>>>>>>>> series, with
>>>>>>>>>>>>>>>> some tests and other changes. The original cover letter is
>>>>>>>>>>>>>>>> as follows:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This patch set speeds up graphics output on ARM by a
>>>>>>>>>>>>>>>>> factor of 60x.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map
>>>>>>>>>>>>>>>>> it as cached,
>>>>>>>>>>>>>>>>> but need it accessible by the display controller which
>>>>>>>>>>>>>>>>> reads directly
>>>>>>>>>>>>>>>>> from a later point of consistency. Hence, we flush the
>>>>>>>>>>>>>>>>> frame buffer to
>>>>>>>>>>>>>>>>> DRAM on every change. The full frame buffer.
>>>>>>>>>>>>>>> It should not, see below.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Unfortunately, with the advent of 4k displays, we are
>>>>>>>>>>>>>>>>> seeing frame buffers
>>>>>>>>>>>>>>>>> that can take a while to flush out. This was reported by
>>>>>>>>>>>>>>>>> Da Xue with grub,
>>>>>>>>>>>>>>>>> which happily print 1000s of spaces on the screen to draw
>>>>>>>>>>>>>>>>> a menu. Every
>>>>>>>>>>>>>>>>> printed space triggers a cache flush.
>>>>>>>>>>>>>>> That is a bug somewhere in EFI.
>>>>>>>>>>>>>> Unfortunately not :). You may call it a bug in grub: It
>>>>>>>>>>>>>> literally prints
>>>>>>>>>>>>>> over space characters for every character in its menu that it
>>>>>>>>>>>>>> wants
>>>>>>>>>>>>>> cleared. On every text screen draw.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This wouldn't be a big issue if we only flush the reactangle
>>>>>>>>>>>>>> that gets
>>>>>>>>>>>>>> modified. But without this patch set, we're flushing the full
>>>>>>>>>>>>>> DRAM
>>>>>>>>>>>>>> buffer on every u-boot text console character write, which
>>>>>>>>>>>>>> means for
>>>>>>>>>>>>>> every character (as that's the only API UEFI has).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As a nice side effect, we speed up the normal U-Boot text
>>>>>>>>>>>>>> console as
>>>>>>>>>>>>>> well with this patch set, because even "normal" text prints
>>>>>>>>>>>>>> that write
>>>>>>>>>>>>>> for example a single line of text on the screen today flush
>>>>>>>>>>>>>> the full
>>>>>>>>>>>>>> frame buffer to DRAM.
>>>>>>>>>>>>> No, I mean that it is a bug that U-Boot (apparently) flushes
>>>>>>>>>>>>> the cache
>>>>>>>>>>>>> after every character. It doesn't do that for normal character
>>>>>>>>>>>>> output
>>>>>>>>>>>>> and I don't think it makes sense to do it for EFI either.
>>>>>>>>>>>> I see. Let's trace the calls:
>>>>>>>>>>>>
>>>>>>>>>>>> efi_cout_output_string()
>>>>>>>>>>>> -> fputs()
>>>>>>>>>>>> -> vidconsole_puts()
>>>>>>>>>>>> -> video_sync()
>>>>>>>>>>>> -> flush_dcache_range()
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately grub abstracts character backends down to the
>>>>>>>>>>>> "print
>>>>>>>>>>>> character" level, so it calls UEFI's sopisticated
>>>>>>>>>>>> "output_string"
>>>>>>>>>>>> callback with single characters at a time, which means we do a
>>>>>>>>>>>> full
>>>>>>>>>>>> dcache flush for every character that we print:
>>>>>>>>>>>>
>>>>>>>>>>>> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This patch set implements the easiest mitigation against
>>>>>>>>>>>>>>>>> this problem:
>>>>>>>>>>>>>>>>> Damage tracking. We remember the lowest common denominator
>>>>>>>>>>>>>>>>> region that was
>>>>>>>>>>>>>>>>> touched since the last video_sync() call and only flush
>>>>>>>>>>>>>>>>> that. The most
>>>>>>>>>>>>>>>>> typical writer to the frame buffer is the video console,
>>>>>>>>>>>>>>>>> which always
>>>>>>>>>>>>>>>>> writes rectangles of characters on the screen and syncs
>>>>>>>>>>>>>>>>> afterwards.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With this patch set applied, we reduce drawing a large
>>>>>>>>>>>>>>>>> grub menu (with
>>>>>>>>>>>>>>>>> serial console attached for size information) on an
>>>>>>>>>>>>>>>>> RK3399-ROC system
>>>>>>>>>>>>>>>>> at 1440p from 55 seconds to less than 1 second.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Version 2 also implements VIDEO_COPY using this mechanism,
>>>>>>>>>>>>>>>>> reducing its
>>>>>>>>>>>>>>>>> overhead compared to before as well. So even x86 systems
>>>>>>>>>>>>>>>>> should be faster
>>>>>>>>>>>>>>>>> with this now :).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Alternatives considered:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>          1) Lazy sync - Sandbox does this. It only calls
>>>>>>>>>>>>>>>>> video_sync(true) ever
>>>>>>>>>>>>>>>>>             so often. We are missing timers to do this
>>>>>>>>>>>>>>>>> generically.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>          2) Double buffering - We could try to identify
>>>>>>>>>>>>>>>>> whether anything changed
>>>>>>>>>>>>>>>>>             at all and only draw to the FB if it did. That
>>>>>>>>>>>>>>>>> would require
>>>>>>>>>>>>>>>>>             maintaining a second buffer that we need to
>>>>>>>>>>>>>>>>> scan.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>          3) Text buffer - Maintain a buffer of all text
>>>>>>>>>>>>>>>>> printed on the screen with
>>>>>>>>>>>>>>>>>             respective location. Don't write if the old and
>>>>>>>>>>>>>>>>> new character are
>>>>>>>>>>>>>>>>>             identical. This would limit applicability to
>>>>>>>>>>>>>>>>> text only and is an
>>>>>>>>>>>>>>>>>             optimization on top of this patch set.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>          4) Hash screen lines - Create a hash (sha256?)
>>>>>>>>>>>>>>>>> over every line when it
>>>>>>>>>>>>>>>>>             changes. Only flush when it does. I'm not sure
>>>>>>>>>>>>>>>>> if this would waste
>>>>>>>>>>>>>>>>>             more time, memory and cache than the current
>>>>>>>>>>>>>>>>> approach. It would make
>>>>>>>>>>>>>>>>>             full screen updates much more expensive.
>>>>>>>>>>>>>>> 5) Fix the bug mentioned above?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes in v5:
>>>>>>>>>>>>>>>> - Add patch "video: test: Split copy frame buffer check
>>>>>>>>>>>>>>>> into a function"
>>>>>>>>>>>>>>>> - Add patch "video: test: Support checking copy frame
>>>>>>>>>>>>>>>> buffer contents"
>>>>>>>>>>>>>>>> - Add patch "video: test: Test partial updates of hardware
>>>>>>>>>>>>>>>> frame buffer"
>>>>>>>>>>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>>>>>>>>>>> - Document damage struct and fields in struct video_priv
>>>>>>>>>>>>>>>> comment
>>>>>>>>>>>>>>>> - Return void from video_damage()
>>>>>>>>>>>>>>>> - Fix undeclared priv error in video_sync()
>>>>>>>>>>>>>>>> - Drop unused headers from video-uclass.c
>>>>>>>>>>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>>>>>>>>>> - Call video_damage() also in video_fill_part()
>>>>>>>>>>>>>>>> - Use met->baseline instead of priv->baseline
>>>>>>>>>>>>>>>> - Use fontdata->height/width instead of
>>>>>>>>>>>>>>>> VIDEO_FONT_HEIGHT/WIDTH
>>>>>>>>>>>>>>>> - Update console_rotate.c video_damage() calls to pass
>>>>>>>>>>>>>>>> video tests
>>>>>>>>>>>>>>>> - Remove mention about not having minimal damage for
>>>>>>>>>>>>>>>> console_rotate.c
>>>>>>>>>>>>>>>> - Add patch "video: test: Test video damage tracking via
>>>>>>>>>>>>>>>> vidconsole"
>>>>>>>>>>>>>>>> - Document new vdev field in struct efi_gop_obj comment
>>>>>>>>>>>>>>>> - Remove video_sync_copy() also from video_fill(),
>>>>>>>>>>>>>>>> video_fill_part()
>>>>>>>>>>>>>>>> - Fix memmove() calls by removing the extra dev argument
>>>>>>>>>>>>>>>> - Call video_sync() before checking copy_fb in video tests
>>>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE for video drivers instead of
>>>>>>>>>>>>>>>> selecting it
>>>>>>>>>>>>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> v4:
>>>>>>>>>>>>>>>> https://lore.kernel.org/all/20230103215004.22646-1-agraf@csgraf.de/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes in v4:
>>>>>>>>>>>>>>>> - Move damage clear to patch "dm: video: Add damage
>>>>>>>>>>>>>>>> tracking API"
>>>>>>>>>>>>>>>> - Simplify first damage logic
>>>>>>>>>>>>>>>> - Remove VIDEO_DAMAGE default for ARM
>>>>>>>>>>>>>>>> - Skip damage on EfiBltVideoToBltBuffer
>>>>>>>>>>>>>>>> - Add patch "video: Always compile cache flushing code"
>>>>>>>>>>>>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that
>>>>>>>>>>>>>>>> need it"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> v3:
>>>>>>>>>>>>>>>> https://lore.kernel.org/all/20221230195828.88134-1-agraf@csgraf.de/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes in v3:
>>>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>>>> - Adapt to always assume DM is used
>>>>>>>>>>>>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> v2:
>>>>>>>>>>>>>>>> https://lore.kernel.org/all/20220609225921.62462-1-agraf@csgraf.de/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>>>>> - Remove ifdefs
>>>>>>>>>>>>>>>> - Fix ranges in truetype target
>>>>>>>>>>>>>>>> - Limit rotate to necessary damage
>>>>>>>>>>>>>>>> - Remove ifdefs from gop
>>>>>>>>>>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>>>>>>>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> v1:
>>>>>>>>>>>>>>>> https://lore.kernel.org/all/20220606234336.5021-1-agraf@csgraf.de/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Alexander Graf (9):
>>>>>>>>>>>>>>>>          dm: video: Add damage tracking API
>>>>>>>>>>>>>>>>          dm: video: Add damage notification on display fills
>>>>>>>>>>>>>>>>          vidconsole: Add damage notifications to all
>>>>>>>>>>>>>>>> vidconsole drivers
>>>>>>>>>>>>>>>>          video: Add damage notification on bmp display
>>>>>>>>>>>>>>>>          efi_loader: GOP: Add damage notification on BLT
>>>>>>>>>>>>>>>>          video: Only dcache flush damaged lines
>>>>>>>>>>>>>>>>          video: Use VIDEO_DAMAGE for VIDEO_COPY
>>>>>>>>>>>>>>>>          video: Always compile cache flushing code
>>>>>>>>>>>>>>>>          video: Enable VIDEO_DAMAGE for drivers that need it
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Alper Nebi Yasak (4):
>>>>>>>>>>>>>>>>          video: test: Split copy frame buffer check into a
>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>          video: test: Support checking copy frame buffer
>>>>>>>>>>>>>>>> contents
>>>>>>>>>>>>>>>>          video: test: Test partial updates of hardware frame
>>>>>>>>>>>>>>>> buffer
>>>>>>>>>>>>>>>>          video: test: Test video damage tracking via
>>>>>>>>>>>>>>>> vidconsole
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         arch/arm/mach-omap2/omap3/Kconfig |   1 +
>>>>>>>>>>>>>>>>         arch/arm/mach-sunxi/Kconfig |   1 +
>>>>>>>>>>>>>>>>         drivers/video/Kconfig |  26 +++
>>>>>>>>>>>>>>>>         drivers/video/console_normal.c |  27 ++--
>>>>>>>>>>>>>>>>         drivers/video/console_rotate.c |  94 +++++++----
>>>>>>>>>>>>>>>>         drivers/video/console_truetype.c |  37 +++--
>>>>>>>>>>>>>>>>         drivers/video/exynos/Kconfig |   1 +
>>>>>>>>>>>>>>>>         drivers/video/imx/Kconfig |   1 +
>>>>>>>>>>>>>>>>         drivers/video/meson/Kconfig |   1 +
>>>>>>>>>>>>>>>>         drivers/video/rockchip/Kconfig |   1 +
>>>>>>>>>>>>>>>>         drivers/video/stm32/Kconfig |   1 +
>>>>>>>>>>>>>>>>         drivers/video/tegra20/Kconfig |   1 +
>>>>>>>>>>>>>>>>         drivers/video/tidss/Kconfig |   1 +
>>>>>>>>>>>>>>>>         drivers/video/vidconsole-uclass.c |  16 --
>>>>>>>>>>>>>>>>         drivers/video/video-uclass.c | 190
>>>>>>>>>>>>>>>> ++++++++++++----------
>>>>>>>>>>>>>>>>         drivers/video/video_bmp.c |   7 +-
>>>>>>>>>>>>>>>>         include/video.h |  59 +++----
>>>>>>>>>>>>>>>>         include/video_console.h |  52 ------
>>>>>>>>>>>>>>>>         lib/efi_loader/efi_gop.c |   7 +
>>>>>>>>>>>>>>>>         test/dm/video.c | 256
>>>>>>>>>>>>>>>> ++++++++++++++++++++++++------
>>>>>>>>>>>>>>>>         20 files changed, 483 insertions(+), 297 deletions(-)
>>>>>>>>>>>>>>> It is good to see this tidied up into something that can be
>>>>>>>>>>>>>>> applied!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am unsure what is going on with the EFI performance,
>>>>>>>>>>>>>>> though. It
>>>>>>>>>>>>>>> should not flush the cache after every character, only after
>>>>>>>>>>>>>>> a new
>>>>>>>>>>>>>>> line. Is there something wrong in here? If so, we should fix
>>>>>>>>>>>>>>> that bug
>>>>>>>>>>>>>>> first and it should be patch 1 of this series.
>>>>>>>>>>>>>> Before I came up with this series, I was trying to identify
>>>>>>>>>>>>>> the UEFI bug
>>>>>>>>>>>>>> in question as well, because intuition told me surely this is
>>>>>>>>>>>>>> a bug in
>>>>>>>>>>>>>> UEFI :). Turns out it really isn't this time around.
>>>>>>>>>>>>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI
>>>>>>>>>>>>> implementation. Where did you look for the bug?
>>>>>>>>>>>> The "real" bug is in grub. But given that it's reasonably
>>>>>>>>>>>> simple to work
>>>>>>>>>>>> around in U-Boot and even with it "fixed" in grub we would
>>>>>>>>>>>> still see
>>>>>>>>>>>> performance benefits from flushing only parts of the screen, I
>>>>>>>>>>>> think
>>>>>>>>>>>> it's worth living with the grub deficiency.
>>>>>>>>>>> OK thanks for digging into it. I suggest we add a param to
>>>>>>>>>>> vidconsole_puts() to tell it whether to sync or not, then the
>>>>>>>>>>> EFI code
>>>>>>>>>>> can indicate this and try to be a bit smarter about it.
>>>>>>>>>> It doesn't know when to sync either. From its point of view, any
>>>>>>>>>> "console output" could be the last one. There is no API in UEFI
>>>>>>>>>> that
>>>>>>>>>> says "please flush console output now".
>>>>>>>>> Yes, I understand. I was not suggesting we were missing an API. But
>>>>>>>>> some sort of heuristic would do, e.g. only flush on a newline,
>>>>>>>>> flush
>>>>>>>>> every 50 chars, etc.
>>>>>>>> I can't think of any heuristic that would reliably work. Relevant
>>>>>>>> for
>>>>>>>> this conversation, UEFI provides 2 calls:
>>>>>>>>
>>>>>>>>       * Write string to screen (efi_cout_output_string)
>>>>>>>>       * Set text cursor position to X, Y
>>>>>>>> (efi_cout_set_cursor_position)
>>>>>>>>
>>>>>>>> It's perfectly legal for a UEFI application to do something like
>>>>>>>>
>>>>>>>> efi_cout_set_cursor_position(10, 10);
>>>>>>>> efi_cout_output_string("f");
>>>>>>>> efi_cout_output_string("o");
>>>>>>>> efi_cout_output_string("o") ;
>>>>>>>>
>>>>>>>> to update contents of a virtual text box on the screen. Where in
>>>>>>>> this
>>>>>>>> chain of events would we call video_sync(), but on every call to
>>>>>>>> efi_cout_output_string()?
>>>>>>> Actually U-Boot has the same problem, but we have managed to work
>>>>>>> out something.
>>>>>> U-Boot as a code base has a much easier stance: It can add APIs
>>>>>> when it
>>>>>> needs them in places that require them. With UEFI (as well as the
>>>>>> U-Boot
>>>>>> native API), we're stuck with what's there.
>>>>>>
>>>>>> I also don't understand what you mean by "we have managed to work out
>>>>>> something". This patch set is not a UEFI fix - it fixes generic U-Boot
>>>>>> behavior and speeds up non-UEFI boots as well. The improvement
>>>>>> there is
>>>>>> just not as impressive as with grub :).
>>>>> We are still not quite on the same page...
>>>>>
>>>>> U-Boot does have video_sync() but it doesn't know when to call it. If
>>>>> it does not call it, then any amount of single-threaded code can run
>>>>> after that, which may update the framebuffer. In other words, U-Boot
>>>>> is in exactly the same boat as UEFI. It has to decide whether to call
>>>>> video_sync() based on some sort of heuristic.
>>>>>
>>>>> That is the only point I am trying to make here. Does that make sense?
>>>>
>>>> Oh, I thought you mentioned above that U-Boot is in a better spot or
>>>> "has it solved already". I agree - it's in the same boat and the only
>>>> safe thing it can really do today that is fully cross-platform
>>>> compatible is to call video_sync() after every character.
>>>>
>>>> I don't understand what you mean by "any amount of single-threaded code
>>>> can run after that, which may update the framebuffer". Any framebuffer
>>>> modification is U-Boot internal code which then again can apply
>>>> video_sync() to tell the system "I want what I wrote to screen actually
>>>> be on screen now". I don't think that's necessarily bad design. A bit
>>>> clunky, but we're in a pre-boot environment after all.
>>>>
>>>> Since we're aligned now: What exactly did you refer to with "but we have
>>>> managed to work out something"?
>>> Should we set PixelBltOnly to indicate to UEFI applications that they
>>> are not allowed to directly write to the framebuffer but always have to
>>> use BitBlt? GRUB seems to be using a shadow buffer by default which it
>>> copies via BitBlt.
>>
>> If we do that, OSs will no longer be able to carry the frame buffer
>> address over and continue to use it with to draw on the screen natively
>> (like Linux's efifb).
>>
>> So no, I don't think we should indicate PixelBltOnly. The frame buffer
>> is usually available to applications, you just need to adhere to the
>> architecture's caching constraints.
> Right.  That would probably kill any reasonable way we can have an
> early framebuffer console in most OSes after ExitBootServices() has
> been called.
>
> I'm late to the game but isn't the real solution to have U-Boot map
> the framebuffer in a cache-coherent way?  This is what typically
> happens on x86 where VRAM is mapped as "write-combining".  That is,
> uncached but going through the store buffer to speed up writes.
> Reading from the framebuffer will be slow in that case (which probably
> is the real reason why grub uses a shadow framebuffer).  So U-Boot
> still needs to some cleverness to make sure it only ever writes to the
> framebuffer.


Yeah, those are the 2 options: WB plus flush or WC plus shadow buffer 
for reads. I don't really see much benefit in doing the latter over the 
former: It occupies more valuable RAM and adds additional complexity on 
FB reads. I believe the main reason x86 went that route was that it had 
no choice: It didn't have a cache line flush instruction for a long time.


Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 10/13] video: Only dcache flush damaged lines
  2023-08-30 19:12             ` Alper Nebi Yasak
@ 2023-08-30 19:57               ` Alexander Graf
  0 siblings, 0 replies; 56+ messages in thread
From: Alexander Graf @ 2023-08-30 19:57 UTC (permalink / raw)
  To: Alper Nebi Yasak, Simon Glass
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Philipp Tomsich, Andrew Davis, Da Xue, Heinrich Schuchardt,
	Patrice Chotard, Patrick Delaunay, Derald Woods,
	Anatolij Gustschin, uboot-stm32, Matthias Brugger, u-boot-amlogic,
	Ilias Apalodimas, Neil Armstrong


On 30.08.23 21:12, Alper Nebi Yasak wrote:
> On 2023-08-22 02:03 +03:00, Simon Glass wrote:
>> Hi Alex,
>>
>> On Mon, 21 Aug 2023 at 16:44, Alexander Graf <agraf@csgraf.de> wrote:
>>>
>>> On 22.08.23 00:10, Simon Glass wrote:
>>>> Hi Alex,
>>>>
>>>> On Mon, 21 Aug 2023 at 13:59, Alexander Graf <agraf@csgraf.de> wrote:
>>>>> On 21.08.23 21:11, Simon Glass wrote:
>>>>>> Hi Alper,
>>>>>>
>>>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>>>>>>> From: Alexander Graf <agraf@csgraf.de>
>>>>>>>
>>>>>>> Now that we have a damage area tells us which parts of the frame buffer
>>>>>>> actually need updating, let's only dcache flush those on video_sync()
>>>>>>> calls. With this optimization in place, frame buffer updates - especially
>>>>>>> on large screen such as 4k displays - speed up significantly.
>>>>>>>
>>>>>>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
>>>>>>> Reported-by: Da Xue <da@libre.computer>
>>>>>>> [Alper: Use damage.xstart/yend, IS_ENABLED()]
>>>>>>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>>>>>>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
>>>>>>> ---
>>>>>>>
>>>>>>> Changes in v5:
>>>>>>> - Use xstart, ystart, xend, yend as names for damage region
>>>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
>>>>>>>
>>>>>>> Changes in v2:
>>>>>>> - Fix dcache range; we were flushing too much before
>>>>>>> - Remove ifdefs
>>>>>>>
>>>>>>>     drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++-----
>>>>>>>     1 file changed, 36 insertions(+), 5 deletions(-)
>>>>>> This is a little strange, since flushing the whole cache will only
>>>>>> actually write out data that was actually written (to the display). Is
>>>>>> there a benefit to this patch, in terms of performance?
>>>>> I'm happy to see you go through the same thought process I went through
>>>>> when writing these: "This surely can't be the problem, can it?". The
>>>>> answer is "simple" in hindsight:
>>>>>
>>>>> Have a look at the ARMv8 cache flush function. It does the only "safe"
>>>>> thing you can expect it to do: Clean+Invalidate to POC because we use it
>>>>> for multiple things, clearing modified code among others:
>>>>>
>>>>> ENTRY(__asm_flush_dcache_range)
>>>>>            mrs     x3, ctr_el0
>>>>>            ubfx    x3, x3, #16, #4
>>>>>            mov     x2, #4
>>>>>            lsl     x2, x2, x3              /* cache line size */
>>>>>
>>>>>            /* x2 <- minimal cache line size in cache system */
>>>>>            sub     x3, x2, #1
>>>>>            bic     x0, x0, x3
>>>>> 1:      dc      civac, x0       /* clean & invalidate data or unified
>>>>> cache */
>>>>>            add     x0, x0, x2
>>>>>            cmp     x0, x1
>>>>>            b.lo    1b
>>>>>            dsb     sy
>>>>>            ret
>>>>> ENDPROC(__asm_flush_dcache_range)
>>>>>
>>>>>
>>>>> Looking at the "dc civac" call, we find this documentation page from
>>>>> ARM:
>>>>> https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC
>>>>>
>>>>> This says we're writing any dirtyness of this cache line up to the POC
>>>>> and then invalidate (remove the cache line) also up to POC. That means
>>>>> when you look at a typical SBC, this will either be L2 or system level
>>>>> cache. Every read afterwards needs to go and pull it all the way back to
>>>>> L1 to modify it (or not) on the next character write and then flush it
>>>>> again.
>>>>>
>>>>> Even worse: Because of the invalidate, we may even evict it from caches
>>>>> that the display controller uses to read the frame buffer. So depending
>>>>> on the SoC's cache topology and implementation, we may force the display
>>>>> controller to refetch the full FB content on its next screen refresh cycle.
>>>>>
>>>>> I faintly remember that I tried to experiment with CVAC instead to only
>>>>> flush without invalidating. I don't fully remember the results anymore
>>>>> though. I believe CVAC just behaved identical to CIVAC on the A53
>>>>> platform I was working on. And then I looked at Cortex-A53 errata like
>>>>> [1] and just accepted that doing anything but restricting the flushing
>>>>> range is a waste of time :)
>>>> Yuck I didn't know it was invalidating too. That is horrible. Is there
>>>> no way to fix it?
>>>
>>> Before building all of this damage logic, I tried, but failed. I'd
>>> welcome anyone else to try again :). I'm not even convinced yet that it
>>> is actually fixable: Depending on the SoC's internal cache logic, it may
>>> opt to always invalidate I think.
>> Wow, that is crazy! How is anyone supposed to make the system run well
>> with logic like that??!
>>
>>> That said, this patch set really also makes sense outside of the
>>> particular invalidate problem. It creates a generic abstraction between
>>> the copy and non-copy code path and allows us to reduce the amount of
>>> work spent for both, generically for any video sync operation.
>> Sure...my question was really why it helps so much, given what I
>> understood the caches to be doing. If they are invalidating, then it
>> is amazing anything gets done...
> I don't really know cache mechanisms and terminology, but AFAIU there's
> nothing actionable for this patch regarding this discussion, right?
>
> Meanwhile I noticed this patch only flushes priv->fb, and think it also
> needs to flush priv->copy_fb if VIDEO_COPY.


The reason was mostly that copy_fb is really only used on x86 where we 
don't have the cache flush problem/code :). So nobody bothered to add 
flushing to that code path.


Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 04/13] dm: video: Add damage tracking API
  2023-08-30 19:15     ` Alper Nebi Yasak
@ 2023-08-31  2:49       ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-31  2:49 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: u-boot, Kever Yang, Jagan Teki, Andre Przywara, Svyatoslav Ryhel,
	Alexander Graf, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alper,

On Wed, 30 Aug 2023 at 13:15, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
>
>
> On 2023-08-21 22:11 +03:00, Simon Glass wrote:
> > On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >>
> >> From: Alexander Graf <agraf@csgraf.de>
> >>
> >> We are going to introduce image damage tracking to fasten up screen
> >> refresh on large displays. This patch adds damage tracking for up to
> >> one rectangle of the screen which is typically enough to hold blt or
> >> text print updates. Callers into this API and a reduced dcache flush
> >> code path will follow in later patches.
> >>
> >> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> >> Reported-by: Da Xue <da@libre.computer>
> >> [Alper: Use xstart/yend, document new fields, return void from
> >>         video_damage(), declare priv, drop headers, use IS_ENABLED()]
> >> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> >> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> >> ---
> >>
> >> Changes in v5:
> >> - Use xstart, ystart, xend, yend as names for damage region
> >> - Document damage struct and fields in struct video_priv comment
> >> - Return void from video_damage()
> >> - Fix undeclared priv error in video_sync()
> >> - Drop unused headers from video-uclass.c
> >> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>
> >> Changes in v4:
> >> - Move damage clear to patch "dm: video: Add damage tracking API"
> >> - Simplify first damage logic
> >> - Remove VIDEO_DAMAGE default for ARM
> >>
> >> Changes in v3:
> >> - Adapt to always assume DM is used
> >>
> >> Changes in v2:
> >> - Remove ifdefs
> >>
> >>  drivers/video/Kconfig        | 13 ++++++++++++
> >>  drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++++---
> >>  include/video.h              | 32 ++++++++++++++++++++++++++--
> >>  3 files changed, 81 insertions(+), 5 deletions(-)
> >>
> >
> > Reviewed-by: Simon Glass <sjg@chromium.org>
> >
> > But I suggest an empty static inline in the case where the feature is disabled?

>
> You mean with something like #ifdef CONFIG_VIDEO_DAMAGE, right?

Yes

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY
  2023-08-30 19:07       ` Alper Nebi Yasak
@ 2023-08-31  2:49         ` Simon Glass
  0 siblings, 0 replies; 56+ messages in thread
From: Simon Glass @ 2023-08-31  2:49 UTC (permalink / raw)
  To: Alper Nebi Yasak
  Cc: Alexander Graf, u-boot, Kever Yang, Jagan Teki, Andre Przywara,
	Svyatoslav Ryhel, Philipp Tomsich, Andrew Davis, Da Xue,
	Heinrich Schuchardt, Patrice Chotard, Patrick Delaunay,
	Derald Woods, Anatolij Gustschin, uboot-stm32, Matthias Brugger,
	u-boot-amlogic, Ilias Apalodimas, Neil Armstrong

Hi Alper,

On Wed, 30 Aug 2023 at 13:07, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
>
> On 2023-08-21 23:06 +03:00, Alexander Graf wrote:
> >
> > On 21.08.23 21:11, Simon Glass wrote:
> >> Hi Alper,
> >>
> >> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <alpernebiyasak@gmail.com> wrote:
> >>> From: Alexander Graf <agraf@csgraf.de>
> >>>
> >>> CONFIG_VIDEO_COPY implemented a range-based copying mechanism: If we
> >>> print a single character, it will always copy the full range of bytes
> >>> from the top left corner of the character to the lower right onto the
> >>> uncached frame buffer. This includes pretty much the full line contents
> >>> of the printed character.
> >>>
> >>> Since we now have proper damage tracking, let's make use of that to reduce
> >>> the amount of data we need to copy. With this patch applied, we will only
> >>> copy the tiny rectangle surrounding characters when we print them,
> >>> speeding up the video console.
> >> I suppose for rotated consoles it copies whole lines, but otherwise it
> >> does a lot of small copies?
> >
> >
> > I tried to keep the code as simple as possible and only track an "upper
> > left" and "lower right" corner of modifications. So sync will always
> > copy/flush a single rectangle.
>
> Yep, see patch 06/13 for size of the regions. E.g. for putc_xy()s it's
> fontdata->height * fontdata->width, for rows it's like fontdata->height
> * vid_priv->xsize * count...
>
> >>
> >>> After this, changes to the main frame buffer are not immediately copied
> >>> to the copy frame buffer, but postponed until the next video device
> >>> sync. So issue an explicit sync before inspecting the copy frame buffer
> >>> contents for the video tests.
> >> So how does the sync get done in this case?
> >
> > It gets called as part of video_sync():
> >
> > +static void video_flush_copy(struct udevice *vid)
> > +{
> > +     struct video_priv *priv = dev_get_uclass_priv(vid);
> > +
> > +     if (!priv->copy_fb)
> > +             return;
> > +
> > +     if (priv->damage.xend && priv->damage.yend) {
> > +             int lstart = priv->damage.xstart * VNBYTES(priv->bpix);
> > +             int lend = priv->damage.xend * VNBYTES(priv->bpix);
> > +             int y;
> > +
> > +             for (y = priv->damage.ystart; y < priv->damage.yend; y++) {
> > +                     ulong offset = (y * priv->line_length) + lstart;
> > +                     ulong len = lend - lstart;
> > +
> > +                     memcpy(priv->copy_fb + offset, priv->fb + offset, len);
> > +             }
> > +     }
> > +}
>
> I think Simon was asking how and when video_sync() is called outside the
> tests. The tests use lower-level functions that are ops->putc_xy() in
> each console, and normally vidconsole calls higher on the call-chain
> also maybe do a video_sync() when they think it's worth updating the
> display.
>
> >>
> >>> Signed-off-by: Alexander Graf <agraf@csgraf.de>
> >>> [Alper: Rebase for fontdata->height/w, fill_part(), fix memmove(dev),
> >>>          drop from defconfig, use damage.xstart/yend, use IS_ENABLED(),
> >>>          call video_sync() before copy_fb check, update video_copy test]
> >>> Co-developed-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> >>> Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
> >>> ---
> >>>
> >>> Changes in v5:
> >>> - Remove video_sync_copy() also from video_fill(), video_fill_part()
> >>> - Fix memmove() calls by removing the extra dev argument
> >>> - Call video_sync() before checking copy_fb in video tests
> >>> - Use xstart, ystart, xend, yend as names for damage region
> >>> - Use met->baseline instead of priv->baseline
> >>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH
> >>> - Use xstart, ystart, xend, yend as names for damage region
> >>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED()
> >>> - Drop VIDEO_DAMAGE from sandbox defconfig added in a new patch
> >>> - Update dm_test_video_copy test added in a new patch
> >>>
> >>> Changes in v3:
> >>> - Make VIDEO_COPY always select VIDEO_DAMAGE
> >>>
> >>> Changes in v2:
> >>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY"
> >>>
> >>>   configs/sandbox_defconfig         |  1 -
> >>>   drivers/video/Kconfig             |  5 ++
> >>>   drivers/video/console_normal.c    | 13 +----
> >>>   drivers/video/console_rotate.c    | 44 +++-----------
> >>>   drivers/video/console_truetype.c  | 16 +----
> >>>   drivers/video/vidconsole-uclass.c | 16 -----
> >>>   drivers/video/video-uclass.c      | 97 ++++++++-----------------------
> >>>   drivers/video/video_bmp.c         |  7 ---
> >>>   include/video.h                   | 37 ------------
> >>>   include/video_console.h           | 52 -----------------
> >>>   test/dm/video.c                   |  3 +-
> >>>   11 files changed, 43 insertions(+), 248 deletions(-)
> >>>
> >>> diff --git a/configs/sandbox_defconfig b/configs/sandbox_defconfig
> >>> index 51b820f13121..259f31f26cee 100644
> >>> --- a/configs/sandbox_defconfig
> >>> +++ b/configs/sandbox_defconfig
> >>> @@ -307,7 +307,6 @@ CONFIG_USB_ETH_CDC=y
> >>>   CONFIG_VIDEO=y
> >>>   CONFIG_VIDEO_FONT_SUN12X22=y
> >>>   CONFIG_VIDEO_COPY=y
> >>> -CONFIG_VIDEO_DAMAGE=y
> >>>   CONFIG_CONSOLE_ROTATION=y
> >>>   CONFIG_CONSOLE_TRUETYPE=y
> >>>   CONFIG_CONSOLE_TRUETYPE_CANTORAONE=y
> >>> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> >>> index 97f494a1340b..b3fbd9d7d9ca 100644
> >>> --- a/drivers/video/Kconfig
> >>> +++ b/drivers/video/Kconfig
> >>> @@ -83,11 +83,14 @@ config VIDEO_PCI_DEFAULT_FB_SIZE
> >>>
> >>>   config VIDEO_COPY
> >>>          bool "Enable copying the frame buffer to a hardware copy"
> >>> +       select VIDEO_DAMAGE
> >>>          help
> >>>            On some machines (e.g. x86), reading from the frame buffer is very
> >>>            slow because it is uncached. To improve performance, this feature
> >>>            allows the frame buffer to be kept in cached memory (allocated by
> >>>            U-Boot) and then copied to the hardware frame-buffer as needed.
> >>> +         It uses the VIDEO_DAMAGE feature to keep track of regions to copy
> >>> +         and will only copy actually touched regions.
> >>>
> >>>            To use this, your video driver must set @copy_base in
> >>>            struct video_uc_plat.
> >>> @@ -105,6 +108,8 @@ config VIDEO_DAMAGE
> >>>            regions of the frame buffer that were modified before, speeding up
> >>>            screen refreshes significantly.
> >>>
> >>> +         It is also used by VIDEO_COPY to identify which regions changed.
> >>> +
> >>>   config BACKLIGHT_PWM
> >>>          bool "Generic PWM based Backlight Driver"
> >>>          depends on BACKLIGHT && DM_PWM
> >>> diff --git a/drivers/video/console_normal.c b/drivers/video/console_normal.c
> >>> index a19ce6a2bc11..c44aa09473a3 100644
> >>> --- a/drivers/video/console_normal.c
> >>> +++ b/drivers/video/console_normal.c
> >>> @@ -35,10 +35,6 @@ static int console_set_row(struct udevice *dev, uint row, int clr)
> >>>                  fill_pixel_and_goto_next(&dst, clr, pbytes, pbytes);
> >>>          end = dst;
> >>>
> >>> -       ret = vidconsole_sync_copy(dev, line, end);
> >>> -       if (ret)
> >>> -               return ret;
> >>> -
> >>>          video_damage(dev->parent,
> >>>                       0,
> >>>                       fontdata->height * row,
> >>> @@ -57,14 +53,11 @@ static int console_move_rows(struct udevice *dev, uint rowdst,
> >>>          void *dst;
> >>>          void *src;
> >>>          int size;
> >>> -       int ret;
> >>>
> >>>          dst = vid_priv->fb + rowdst * fontdata->height * vid_priv->line_length;
> >>>          src = vid_priv->fb + rowsrc * fontdata->height * vid_priv->line_length;
> >>>          size = fontdata->height * vid_priv->line_length * count;
> >>> -       ret = vidconsole_memmove(dev, dst, src, size);
> >>> -       if (ret)
> >>> -               return ret;
> >>> +       memmove(dst, src, size);
> >> Why are you making that change?
> >
> > There is no point in keeping a special vidconsole_memmove() around
> > anymore, since we don't actually need to call vidconsole_sync_copy()
> > after the move. The damage call that we introduced to all call sites in
> > combination with a video_sync() call takes over the job of the sync copy.
>
> More specifically, this batches the copying work video_sync_copy() does
> per console-op into video_flush_copy() called once per video_sync().
> Then, since vidconsole_memmove() is only used to memmove() and invoke
> that copy mechanism, we can also reduce it to just memmove().

OK, thank you.

Regards,
Simon

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2023-08-31  2:51 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-21 13:50 [PATCH v5 00/13] Add video damage tracking Alper Nebi Yasak
2023-08-21 13:50 ` [PATCH v5 01/13] video: test: Split copy frame buffer check into a function Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 13:50 ` [PATCH v5 02/13] video: test: Support checking copy frame buffer contents Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 13:51 ` [PATCH v5 03/13] video: test: Test partial updates of hardware frame buffer Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 13:51 ` [PATCH v5 04/13] dm: video: Add damage tracking API Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-30 19:15     ` Alper Nebi Yasak
2023-08-31  2:49       ` Simon Glass
2023-08-21 13:51 ` [PATCH v5 05/13] dm: video: Add damage notification on display fills Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 13:51 ` [PATCH v5 06/13] vidconsole: Add damage notifications to all vidconsole drivers Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 13:51 ` [PATCH v5 07/13] video: test: Test video damage tracking via vidconsole Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 13:51 ` [PATCH v5 08/13] video: Add damage notification on bmp display Alper Nebi Yasak
2023-08-21 13:51 ` [PATCH v5 09/13] efi_loader: GOP: Add damage notification on BLT Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 13:51 ` [PATCH v5 10/13] video: Only dcache flush damaged lines Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 19:59     ` Alexander Graf
2023-08-21 22:10       ` Simon Glass
2023-08-21 22:44         ` Alexander Graf
2023-08-21 23:03           ` Simon Glass
2023-08-30 19:12             ` Alper Nebi Yasak
2023-08-30 19:57               ` Alexander Graf
2023-08-21 13:51 ` [PATCH v5 11/13] video: Use VIDEO_DAMAGE for VIDEO_COPY Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 20:06     ` Alexander Graf
2023-08-30 19:07       ` Alper Nebi Yasak
2023-08-31  2:49         ` Simon Glass
2023-08-21 13:51 ` [PATCH v5 12/13] video: Always compile cache flushing code Alper Nebi Yasak
2023-08-21 13:51 ` [PATCH v5 13/13] video: Enable VIDEO_DAMAGE for drivers that need it Alper Nebi Yasak
2023-08-21 19:11   ` Simon Glass
2023-08-21 19:11 ` [PATCH v5 00/13] Add video damage tracking Simon Glass
2023-08-21 19:33   ` Alexander Graf
2023-08-21 19:57     ` Simon Glass
2023-08-21 20:20       ` Alexander Graf
2023-08-21 22:10         ` Simon Glass
2023-08-21 22:40           ` Alexander Graf
2023-08-21 23:03             ` Simon Glass
2023-08-22  7:47               ` Alexander Graf
2023-08-22 18:56                 ` Simon Glass
2023-08-23  8:56                   ` Alexander Graf
2023-08-28 17:54                     ` Simon Glass
2023-08-28 20:24                       ` Alexander Graf
2023-08-28 21:54                         ` Heinrich Schuchardt
2023-08-29  6:20                           ` Alexander Graf
2023-08-29  9:19                             ` Mark Kettenis
2023-08-30 19:55                               ` Alexander Graf
2023-08-28 22:08                         ` Simon Glass
2023-08-29  6:27                           ` Alexander Graf
2023-08-30 18:27                             ` Alper Nebi Yasak
2023-08-30 19:52                               ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox