All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] drm/format-helper: Add helpers for line conversion
@ 2025-03-25 10:31 Thomas Zimmermann
  2025-03-25 10:31 ` [PATCH 1/8] drm/format-helper: Move helpers for pixel conversion to header file Thomas Zimmermann
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

Add helpers for converting the pixel format of scanline. This used
to be implemented for each format individually, but only the per-pixel
code is really different among formats.

Patch 1 moves the per-pixel format helpers from drm_draw.c to an
internal header. These functions have equivalents in the format-helper
code. Just update the interface to make them exportable and add some
comments.

Patches 2 to 5 add generic line-conversion helpers. Each takes a 
scanline with pixels with a specific number of bits per pixel and
converts it to a scanline with a specific number of bits per pixel.
Conversion of each individual pixel is done by the provided per-pixel
helper that was extracted from drm_draw.c.

Patches 6 to 8 slightly optimize the line-conversion helpers by
storing multiple pixels at once. With the current code, there are
too many functions to make opimization feasible. But with the new
helpers, there are only a few places to optimize to benefit all use
cases.

Here's a little benchmark for the common use case of watching a
full-screen video under X. The tests measures the average time
to display a single frame. It uses an XRGB8888 framebuffer for each
frame, but displays it to an RGB565 scanout buffer, thus doing format
conversion with the _32to16 helper. The test system is an Intel Core
i5-3470 with an Intel HD2500 graphics card. The graphics driver is
simpledrm at 1024x768 pixels.

- 32-bit memcpy:	 510 µs/frame
- Current code:		1640 µs/frame
- New (unoptimized):	1600 µs/frame
- New (no 64-bit):	1580 µs/frame
- New (optimized):	1470 µs/frame

The first line shows the time to memcpy a single frame to video
memory without format conversion. With conversion, the old and new
code takes roughly 3 times as long. Both variants are comparable
in performance. With 64-bit stores added, the new code is even 10%
faster then the current one. (This is not to be considered a full
performance test. The main objective is to see whether the new code
is on par with the old code, which is the case.)

Thomas Zimmermann (8):
  drm/format-helper: Move helpers for pixel conversion to header file
  drm/format-helper: Add generic conversion to 32-bit formats
  drm/format-helper: Add generic conversion to 24-bit formats
  drm/format-helper: Add generic conversion to 16-bit formats
  drm/format-helper: Add generic conversion to 8-bit formats
  drm/format-helper: Optimize 32-to-24-bpp conversion
  drm/format-helper: Optimize 32-to-16-bpp conversion
  drm/format-helper: Optimize 32-to-8-bpp conversion

 drivers/gpu/drm/drm_draw.c            | 100 +------
 drivers/gpu/drm/drm_format_helper.c   | 378 ++++++++++++--------------
 drivers/gpu/drm/drm_format_internal.h | 160 +++++++++++
 3 files changed, 339 insertions(+), 299 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_format_internal.h

-- 
2.48.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/8] drm/format-helper: Move helpers for pixel conversion to header file
  2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
@ 2025-03-25 10:31 ` Thomas Zimmermann
  2025-03-26  8:51   ` Jocelyn Falempe
  2025-03-25 10:31 ` [PATCH 2/8] drm/format-helper: Add generic conversion to 32-bit formats Thomas Zimmermann
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

The DRM draw helpers contain format-conversion helpers that operate
on individual pixels. Move them into an internal header file and adopt
them as individual API. Update the draw code accordingly. The pixel
helpers will also be useful for other format conversion helpers.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_draw.c            | 100 +++-------------------
 drivers/gpu/drm/drm_format_internal.h | 119 ++++++++++++++++++++++++++
 2 files changed, 130 insertions(+), 89 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_format_internal.h

diff --git a/drivers/gpu/drm/drm_draw.c b/drivers/gpu/drm/drm_draw.c
index cb2ad12bce57..d41f8ae1c148 100644
--- a/drivers/gpu/drm/drm_draw.c
+++ b/drivers/gpu/drm/drm_draw.c
@@ -11,85 +11,7 @@
 #include <drm/drm_fourcc.h>
 
 #include "drm_draw_internal.h"
-
-/*
- * Conversions from xrgb8888
- */
-
-static u16 convert_xrgb8888_to_rgb565(u32 pix)
-{
-	return ((pix & 0x00F80000) >> 8) |
-	       ((pix & 0x0000FC00) >> 5) |
-	       ((pix & 0x000000F8) >> 3);
-}
-
-static u16 convert_xrgb8888_to_rgba5551(u32 pix)
-{
-	return ((pix & 0x00f80000) >> 8) |
-	       ((pix & 0x0000f800) >> 5) |
-	       ((pix & 0x000000f8) >> 2) |
-	       BIT(0); /* set alpha bit */
-}
-
-static u16 convert_xrgb8888_to_xrgb1555(u32 pix)
-{
-	return ((pix & 0x00f80000) >> 9) |
-	       ((pix & 0x0000f800) >> 6) |
-	       ((pix & 0x000000f8) >> 3);
-}
-
-static u16 convert_xrgb8888_to_argb1555(u32 pix)
-{
-	return BIT(15) | /* set alpha bit */
-	       ((pix & 0x00f80000) >> 9) |
-	       ((pix & 0x0000f800) >> 6) |
-	       ((pix & 0x000000f8) >> 3);
-}
-
-static u32 convert_xrgb8888_to_argb8888(u32 pix)
-{
-	return pix | GENMASK(31, 24); /* fill alpha bits */
-}
-
-static u32 convert_xrgb8888_to_xbgr8888(u32 pix)
-{
-	return ((pix & 0x00ff0000) >> 16) <<  0 |
-	       ((pix & 0x0000ff00) >>  8) <<  8 |
-	       ((pix & 0x000000ff) >>  0) << 16 |
-	       ((pix & 0xff000000) >> 24) << 24;
-}
-
-static u32 convert_xrgb8888_to_abgr8888(u32 pix)
-{
-	return ((pix & 0x00ff0000) >> 16) <<  0 |
-	       ((pix & 0x0000ff00) >>  8) <<  8 |
-	       ((pix & 0x000000ff) >>  0) << 16 |
-	       GENMASK(31, 24); /* fill alpha bits */
-}
-
-static u32 convert_xrgb8888_to_xrgb2101010(u32 pix)
-{
-	pix = ((pix & 0x000000FF) << 2) |
-	      ((pix & 0x0000FF00) << 4) |
-	      ((pix & 0x00FF0000) << 6);
-	return pix | ((pix >> 8) & 0x00300C03);
-}
-
-static u32 convert_xrgb8888_to_argb2101010(u32 pix)
-{
-	pix = ((pix & 0x000000FF) << 2) |
-	      ((pix & 0x0000FF00) << 4) |
-	      ((pix & 0x00FF0000) << 6);
-	return GENMASK(31, 30) /* set alpha bits */ | pix | ((pix >> 8) & 0x00300C03);
-}
-
-static u32 convert_xrgb8888_to_abgr2101010(u32 pix)
-{
-	pix = ((pix & 0x00FF0000) >> 14) |
-	      ((pix & 0x0000FF00) << 4) |
-	      ((pix & 0x000000FF) << 22);
-	return GENMASK(31, 30) /* set alpha bits */ | pix | ((pix >> 8) & 0x00300C03);
-}
+#include "drm_format_internal.h"
 
 /**
  * drm_draw_color_from_xrgb8888 - convert one pixel from xrgb8888 to the desired format
@@ -104,28 +26,28 @@ u32 drm_draw_color_from_xrgb8888(u32 color, u32 format)
 {
 	switch (format) {
 	case DRM_FORMAT_RGB565:
-		return convert_xrgb8888_to_rgb565(color);
+		return drm_pixel_xrgb8888_to_rgb565(color);
 	case DRM_FORMAT_RGBA5551:
-		return convert_xrgb8888_to_rgba5551(color);
+		return drm_pixel_xrgb8888_to_rgba5551(color);
 	case DRM_FORMAT_XRGB1555:
-		return convert_xrgb8888_to_xrgb1555(color);
+		return drm_pixel_xrgb8888_to_xrgb1555(color);
 	case DRM_FORMAT_ARGB1555:
-		return convert_xrgb8888_to_argb1555(color);
+		return drm_pixel_xrgb8888_to_argb1555(color);
 	case DRM_FORMAT_RGB888:
 	case DRM_FORMAT_XRGB8888:
 		return color;
 	case DRM_FORMAT_ARGB8888:
-		return convert_xrgb8888_to_argb8888(color);
+		return drm_pixel_xrgb8888_to_argb8888(color);
 	case DRM_FORMAT_XBGR8888:
-		return convert_xrgb8888_to_xbgr8888(color);
+		return drm_pixel_xrgb8888_to_xbgr8888(color);
 	case DRM_FORMAT_ABGR8888:
-		return convert_xrgb8888_to_abgr8888(color);
+		return drm_pixel_xrgb8888_to_abgr8888(color);
 	case DRM_FORMAT_XRGB2101010:
-		return convert_xrgb8888_to_xrgb2101010(color);
+		return drm_pixel_xrgb8888_to_xrgb2101010(color);
 	case DRM_FORMAT_ARGB2101010:
-		return convert_xrgb8888_to_argb2101010(color);
+		return drm_pixel_xrgb8888_to_argb2101010(color);
 	case DRM_FORMAT_ABGR2101010:
-		return convert_xrgb8888_to_abgr2101010(color);
+		return drm_pixel_xrgb8888_to_abgr2101010(color);
 	default:
 		WARN_ONCE(1, "Can't convert to %p4cc\n", &format);
 		return 0;
diff --git a/drivers/gpu/drm/drm_format_internal.h b/drivers/gpu/drm/drm_format_internal.h
new file mode 100644
index 000000000000..5f82f0b9c8e8
--- /dev/null
+++ b/drivers/gpu/drm/drm_format_internal.h
@@ -0,0 +1,119 @@
+/* SPDX-License-Identifier: GPL-2.0 or MIT */
+
+#ifndef DRM_FORMAT_INTERNAL_H
+#define DRM_FORMAT_INTERNAL_H
+
+#include <linux/bits.h>
+#include <linux/types.h>
+
+/*
+ * Each pixel-format conversion helper takes a raw pixel in a
+ * specific input format and returns a raw pixel in a specific
+ * output format. All pixels are in little-endian byte order.
+ *
+ * Function names are
+ *
+ *   drm_pixel_<input>_to_<output>_<algorithm>()
+ *
+ * where <input> and <output> refer to pixel formats. The
+ * <algorithm> is optional and hints to the method used for the
+ * conversion. Helpers with no algorithm given apply pixel-bit
+ * shifting.
+ *
+ * The argument type is u32. We expect this to be wide enough to
+ * hold all conversion input from 32-bit RGB to any output format.
+ * The Linux kernel should avoid format conversion for anything
+ * but XRGB8888 input data. Converting from other format can still
+ * be acceptable in some cases.
+ *
+ * The return type is u32. It is wide enough to hold all conversion
+ * output from XRGB8888. For output formats wider than 32 bit, a
+ * return type of u64 would be acceptable.
+ */
+
+/*
+ * Conversions from XRGB8888
+ */
+
+static inline u32 drm_pixel_xrgb8888_to_rgb565(u32 pix)
+{
+	return ((pix & 0x00f80000) >> 8) |
+	       ((pix & 0x0000fc00) >> 5) |
+	       ((pix & 0x000000f8) >> 3);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_rgbx5551(u32 pix)
+{
+	return ((pix & 0x00f80000) >> 8) |
+	       ((pix & 0x0000f800) >> 5) |
+	       ((pix & 0x000000f8) >> 2);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_rgba5551(u32 pix)
+{
+	return drm_pixel_xrgb8888_to_rgbx5551(pix) |
+	       BIT(0); /* set alpha bit */
+}
+
+static inline u32 drm_pixel_xrgb8888_to_xrgb1555(u32 pix)
+{
+	return ((pix & 0x00f80000) >> 9) |
+	       ((pix & 0x0000f800) >> 6) |
+	       ((pix & 0x000000f8) >> 3);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_argb1555(u32 pix)
+{
+	return BIT(15) | /* set alpha bit */
+	       drm_pixel_xrgb8888_to_xrgb1555(pix);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_argb8888(u32 pix)
+{
+	return GENMASK(31, 24) | /* fill alpha bits */
+	       pix;
+}
+
+static inline u32 drm_pixel_xrgb8888_to_xbgr8888(u32 pix)
+{
+	return ((pix & 0xff000000)) | /* also copy filler bits */
+	       ((pix & 0x00ff0000) >> 16) |
+	       ((pix & 0x0000ff00)) |
+	       ((pix & 0x000000ff) << 16);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_abgr8888(u32 pix)
+{
+	return GENMASK(31, 24) | /* fill alpha bits */
+	       drm_pixel_xrgb8888_to_xbgr8888(pix);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_xrgb2101010(u32 pix)
+{
+	pix = ((pix & 0x000000ff) << 2) |
+	      ((pix & 0x0000ff00) << 4) |
+	      ((pix & 0x00ff0000) << 6);
+	return pix | ((pix >> 8) & 0x00300c03);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_argb2101010(u32 pix)
+{
+	return GENMASK(31, 30) | /* set alpha bits */
+	       drm_pixel_xrgb8888_to_xrgb2101010(pix);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_xbgr2101010(u32 pix)
+{
+	pix = ((pix & 0x00ff0000) >> 14) |
+	      ((pix & 0x0000ff00) << 4) |
+	      ((pix & 0x000000ff) << 22);
+	return pix | ((pix >> 8) & 0x00300c03);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_abgr2101010(u32 pix)
+{
+	return GENMASK(31, 30) | /* set alpha bits */
+	       drm_pixel_xrgb8888_to_xbgr2101010(pix);
+}
+
+#endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/8] drm/format-helper: Add generic conversion to 32-bit formats
  2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
  2025-03-25 10:31 ` [PATCH 1/8] drm/format-helper: Move helpers for pixel conversion to header file Thomas Zimmermann
@ 2025-03-25 10:31 ` Thomas Zimmermann
  2025-03-26  8:52   ` Jocelyn Falempe
  2025-03-25 10:31 ` [PATCH 3/8] drm/format-helper: Add generic conversion to 24-bit formats Thomas Zimmermann
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

Add drm_fb_xfrm_line_32to32() to implement conversion from 32-bit
pixels to 32-bit pixels. The pixel-conversion is specified by the
given callback parameter. Mark the helper as always_inline to avoid
overhead from function calls.

Then implement all existing line-conversion functions with the new
generic call and the respective pixel-conversion helper.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_format_helper.c | 84 +++++++----------------------
 1 file changed, 19 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index 01d3ab307ac3..abd18c23cfbb 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -20,6 +20,8 @@
 #include <drm/drm_print.h>
 #include <drm/drm_rect.h>
 
+#include "drm_format_internal.h"
+
 /**
  * drm_format_conv_state_init - Initialize format-conversion state
  * @state: The state to initialize
@@ -244,6 +246,18 @@ static int drm_fb_xfrm(struct iosys_map *dst,
 				     xfrm_line);
 }
 
+static __always_inline void drm_fb_xfrm_line_32to32(void *dbuf, const void *sbuf,
+						    unsigned int pixels,
+						    u32 (*xfrm_pixel)(u32))
+{
+	__le32 *dbuf32 = dbuf;
+	const __le32 *sbuf32 = sbuf;
+	const __le32 *send32 = sbuf32 + pixels;
+
+	while (sbuf32 < send32)
+		*dbuf32++ = cpu_to_le32(xfrm_pixel(le32_to_cpup(sbuf32++)));
+}
+
 /**
  * drm_fb_memcpy - Copy clip buffer
  * @dst: Array of destination buffers
@@ -755,16 +769,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_bgr888);
 
 static void drm_fb_xrgb8888_to_argb8888_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le32 *dbuf32 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		pix |= GENMASK(31, 24); /* fill alpha bits */
-		dbuf32[x] = cpu_to_le32(pix);
-	}
+	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_argb8888);
 }
 
 /**
@@ -804,19 +809,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb8888);
 
 static void drm_fb_xrgb8888_to_abgr8888_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le32 *dbuf32 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		pix = ((pix & 0x00ff0000) >> 16) <<  0 |
-		      ((pix & 0x0000ff00) >>  8) <<  8 |
-		      ((pix & 0x000000ff) >>  0) << 16 |
-		      GENMASK(31, 24); /* fill alpha bits */
-		*dbuf32++ = cpu_to_le32(pix);
-	}
+	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_abgr8888);
 }
 
 static void drm_fb_xrgb8888_to_abgr8888(struct iosys_map *dst, const unsigned int *dst_pitch,
@@ -835,19 +828,7 @@ static void drm_fb_xrgb8888_to_abgr8888(struct iosys_map *dst, const unsigned in
 
 static void drm_fb_xrgb8888_to_xbgr8888_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le32 *dbuf32 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		pix = ((pix & 0x00ff0000) >> 16) <<  0 |
-		      ((pix & 0x0000ff00) >>  8) <<  8 |
-		      ((pix & 0x000000ff) >>  0) << 16 |
-		      ((pix & 0xff000000) >> 24) << 24;
-		*dbuf32++ = cpu_to_le32(pix);
-	}
+	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_xbgr8888);
 }
 
 static void drm_fb_xrgb8888_to_xbgr8888(struct iosys_map *dst, const unsigned int *dst_pitch,
@@ -866,20 +847,7 @@ static void drm_fb_xrgb8888_to_xbgr8888(struct iosys_map *dst, const unsigned in
 
 static void drm_fb_xrgb8888_to_xrgb2101010_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le32 *dbuf32 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 val32;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		val32 = ((pix & 0x000000FF) << 2) |
-			((pix & 0x0000FF00) << 4) |
-			((pix & 0x00FF0000) << 6);
-		pix = val32 | ((val32 >> 8) & 0x00300C03);
-		*dbuf32++ = cpu_to_le32(pix);
-	}
+	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_xrgb2101010);
 }
 
 /**
@@ -920,21 +888,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_xrgb2101010);
 
 static void drm_fb_xrgb8888_to_argb2101010_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le32 *dbuf32 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 val32;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		val32 = ((pix & 0x000000ff) << 2) |
-			((pix & 0x0000ff00) << 4) |
-			((pix & 0x00ff0000) << 6);
-		pix = GENMASK(31, 30) | /* set alpha bits */
-		      val32 | ((val32 >> 8) & 0x00300c03);
-		*dbuf32++ = cpu_to_le32(pix);
-	}
+	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_argb2101010);
 }
 
 /**
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/8] drm/format-helper: Add generic conversion to 24-bit formats
  2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
  2025-03-25 10:31 ` [PATCH 1/8] drm/format-helper: Move helpers for pixel conversion to header file Thomas Zimmermann
  2025-03-25 10:31 ` [PATCH 2/8] drm/format-helper: Add generic conversion to 32-bit formats Thomas Zimmermann
@ 2025-03-25 10:31 ` Thomas Zimmermann
  2025-03-26  8:52   ` Jocelyn Falempe
  2025-03-25 10:31 ` [PATCH 4/8] drm/format-helper: Add generic conversion to 16-bit formats Thomas Zimmermann
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

Add drm_fb_xfrm_line_32to24() to implement conversion from 32-bit
pixels to 24-bit pixels. The pixel-conversion is specified by the
given callback parameter. Mark the helper as always_inline to avoid
overhead from function calls.

Then implement all existing line-conversion functions with the new
generic call and the respective pixel-conversion helper.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_format_helper.c   | 43 ++++++++++++---------------
 drivers/gpu/drm/drm_format_internal.h | 12 ++++++++
 2 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index abd18c23cfbb..5a2fe3d685a2 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -246,6 +246,23 @@ static int drm_fb_xfrm(struct iosys_map *dst,
 				     xfrm_line);
 }
 
+static __always_inline void drm_fb_xfrm_line_32to24(void *dbuf, const void *sbuf,
+						    unsigned int pixels,
+						    u32 (*xfrm_pixel)(u32))
+{
+	u8 *dbuf8 = dbuf;
+	const __le32 *sbuf32 = sbuf;
+	const __le32 *send32 = sbuf32 + pixels;
+
+	while (sbuf32 < send32) {
+		u32 val24 = xfrm_pixel(le32_to_cpup(sbuf32++));
+		/* write output in reverse order for little endianness */
+		*dbuf8++ = (val24 & 0x000000ff);
+		*dbuf8++ = (val24 & 0x0000ff00) >>  8;
+		*dbuf8++ = (val24 & 0x00ff0000) >> 16;
+	}
+}
+
 static __always_inline void drm_fb_xfrm_line_32to32(void *dbuf, const void *sbuf,
 						    unsigned int pixels,
 						    u32 (*xfrm_pixel)(u32))
@@ -667,18 +684,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgba5551);
 
 static void drm_fb_xrgb8888_to_rgb888_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	u8 *dbuf8 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		/* write blue-green-red to output in little endianness */
-		*dbuf8++ = (pix & 0x000000FF) >>  0;
-		*dbuf8++ = (pix & 0x0000FF00) >>  8;
-		*dbuf8++ = (pix & 0x00FF0000) >> 16;
-	}
+	drm_fb_xfrm_line_32to24(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_rgb888);
 }
 
 /**
@@ -718,18 +724,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb888);
 
 static void drm_fb_xrgb8888_to_bgr888_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	u8 *dbuf8 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		/* write red-green-blue to output in little endianness */
-		*dbuf8++ = (pix & 0x00ff0000) >> 16;
-		*dbuf8++ = (pix & 0x0000ff00) >> 8;
-		*dbuf8++ = (pix & 0x000000ff) >> 0;
-	}
+	drm_fb_xfrm_line_32to24(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_bgr888);
 }
 
 /**
diff --git a/drivers/gpu/drm/drm_format_internal.h b/drivers/gpu/drm/drm_format_internal.h
index 5f82f0b9c8e8..e7fcf260a914 100644
--- a/drivers/gpu/drm/drm_format_internal.h
+++ b/drivers/gpu/drm/drm_format_internal.h
@@ -68,6 +68,18 @@ static inline u32 drm_pixel_xrgb8888_to_argb1555(u32 pix)
 	       drm_pixel_xrgb8888_to_xrgb1555(pix);
 }
 
+static inline u32 drm_pixel_xrgb8888_to_rgb888(u32 pix)
+{
+	return pix & GENMASK(23, 0);
+}
+
+static inline u32 drm_pixel_xrgb8888_to_bgr888(u32 pix)
+{
+	return ((pix & 0x00ff0000) >> 16) |
+	       ((pix & 0x0000ff00)) |
+	       ((pix & 0x000000ff) << 16);
+}
+
 static inline u32 drm_pixel_xrgb8888_to_argb8888(u32 pix)
 {
 	return GENMASK(31, 24) | /* fill alpha bits */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/8] drm/format-helper: Add generic conversion to 16-bit formats
  2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
                   ` (2 preceding siblings ...)
  2025-03-25 10:31 ` [PATCH 3/8] drm/format-helper: Add generic conversion to 24-bit formats Thomas Zimmermann
@ 2025-03-25 10:31 ` Thomas Zimmermann
  2025-03-26  8:53   ` Jocelyn Falempe
  2025-03-25 10:31 ` [PATCH 5/8] drm/format-helper: Add generic conversion to 8-bit formats Thomas Zimmermann
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

Add drm_fb_xfrm_line_32to16() to implement conversion from 32-bit
pixels to 16-bit pixels. The pixel-conversion is specified by the
given callback parameter. Mark the helper as always_inline to avoid
overhead from function calls.

Then implement all existing line-conversion functions with the new
generic call and the respective pixel-conversion helper. There's one
pixel-conversion helper that swaps output bytes. It is for gud and
requires refactoring, so don't move it into the header file.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_format_helper.c   | 118 +++++---------------------
 drivers/gpu/drm/drm_format_internal.h |  12 +++
 2 files changed, 34 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index 5a2fe3d685a2..aecf55c1fc6b 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -246,6 +246,18 @@ static int drm_fb_xfrm(struct iosys_map *dst,
 				     xfrm_line);
 }
 
+static __always_inline void drm_fb_xfrm_line_32to16(void *dbuf, const void *sbuf,
+						    unsigned int pixels,
+						    u32 (*xfrm_pixel)(u32))
+{
+	__le16 *dbuf16 = dbuf;
+	const __le32 *sbuf32 = sbuf;
+	const __le32 *send32 = sbuf32 + pixels;
+
+	while (sbuf32 < send32)
+		*dbuf16++ = cpu_to_le16(xfrm_pixel(le32_to_cpup(sbuf32++)));
+}
+
 static __always_inline void drm_fb_xfrm_line_32to24(void *dbuf, const void *sbuf,
 						    unsigned int pixels,
 						    u32 (*xfrm_pixel)(u32))
@@ -448,38 +460,19 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb332);
 
 static void drm_fb_xrgb8888_to_rgb565_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le16 *dbuf16 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u16 val16;
-	u32 pix;
+	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_rgb565);
+}
 
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		val16 = ((pix & 0x00F80000) >> 8) |
-			((pix & 0x0000FC00) >> 5) |
-			((pix & 0x000000F8) >> 3);
-		dbuf16[x] = cpu_to_le16(val16);
-	}
+static __always_inline u32 drm_xrgb8888_to_rgb565_swab(u32 pix)
+{
+	return swab16(drm_pixel_xrgb8888_to_rgb565(pix));
 }
 
 /* TODO: implement this helper as conversion to RGB565|BIG_ENDIAN */
 static void drm_fb_xrgb8888_to_rgb565_swab_line(void *dbuf, const void *sbuf,
 						unsigned int pixels)
 {
-	__le16 *dbuf16 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u16 val16;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		val16 = ((pix & 0x00F80000) >> 8) |
-			((pix & 0x0000FC00) >> 5) |
-			((pix & 0x000000F8) >> 3);
-		dbuf16[x] = cpu_to_le16(swab16(val16));
-	}
+	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_xrgb8888_to_rgb565_swab);
 }
 
 /**
@@ -526,19 +519,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb565);
 
 static void drm_fb_xrgb8888_to_xrgb1555_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le16 *dbuf16 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u16 val16;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		val16 = ((pix & 0x00f80000) >> 9) |
-			((pix & 0x0000f800) >> 6) |
-			((pix & 0x000000f8) >> 3);
-		dbuf16[x] = cpu_to_le16(val16);
-	}
+	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_xrgb1555);
 }
 
 /**
@@ -578,20 +559,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_xrgb1555);
 
 static void drm_fb_xrgb8888_to_argb1555_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le16 *dbuf16 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u16 val16;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		val16 = BIT(15) | /* set alpha bit */
-			((pix & 0x00f80000) >> 9) |
-			((pix & 0x0000f800) >> 6) |
-			((pix & 0x000000f8) >> 3);
-		dbuf16[x] = cpu_to_le16(val16);
-	}
+	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_argb1555);
 }
 
 /**
@@ -631,20 +599,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb1555);
 
 static void drm_fb_xrgb8888_to_rgba5551_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	__le16 *dbuf16 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u16 val16;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		val16 = ((pix & 0x00f80000) >> 8) |
-			((pix & 0x0000f800) >> 5) |
-			((pix & 0x000000f8) >> 2) |
-			BIT(0); /* set alpha bit */
-		dbuf16[x] = cpu_to_le16(val16);
-	}
+	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_rgba5551);
 }
 
 /**
@@ -980,36 +935,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_gray8);
 
 static void drm_fb_argb8888_to_argb4444_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	unsigned int pixels2 = pixels & ~GENMASK_ULL(0, 0);
-	__le32 *dbuf32 = dbuf;
-	__le16 *dbuf16 = dbuf + pixels2 * sizeof(*dbuf16);
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 val32;
-	u16 val16;
-	u32 pix[2];
-
-	for (x = 0; x < pixels2; x += 2, ++dbuf32) {
-		pix[0] = le32_to_cpu(sbuf32[x]);
-		pix[1] = le32_to_cpu(sbuf32[x + 1]);
-		val32 = ((pix[0] & 0xf0000000) >> 16) |
-			((pix[0] & 0x00f00000) >> 12) |
-			((pix[0] & 0x0000f000) >> 8) |
-			((pix[0] & 0x000000f0) >> 4) |
-			((pix[1] & 0xf0000000) >> 0) |
-			((pix[1] & 0x00f00000) << 4) |
-			((pix[1] & 0x0000f000) << 8) |
-			((pix[1] & 0x000000f0) << 12);
-		*dbuf32 = cpu_to_le32(val32);
-	}
-	for (; x < pixels; x++) {
-		pix[0] = le32_to_cpu(sbuf32[x]);
-		val16 = ((pix[0] & 0xf0000000) >> 16) |
-			((pix[0] & 0x00f00000) >> 12) |
-			((pix[0] & 0x0000f000) >> 8) |
-			((pix[0] & 0x000000f0) >> 4);
-		dbuf16[x] = cpu_to_le16(val16);
-	}
+	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_argb8888_to_argb4444);
 }
 
 /**
diff --git a/drivers/gpu/drm/drm_format_internal.h b/drivers/gpu/drm/drm_format_internal.h
index e7fcf260a914..4f20b63cb6f6 100644
--- a/drivers/gpu/drm/drm_format_internal.h
+++ b/drivers/gpu/drm/drm_format_internal.h
@@ -128,4 +128,16 @@ static inline u32 drm_pixel_xrgb8888_to_abgr2101010(u32 pix)
 	       drm_pixel_xrgb8888_to_xbgr2101010(pix);
 }
 
+/*
+ * Conversion from ARGB8888
+ */
+
+static inline u32 drm_pixel_argb8888_to_argb4444(u32 pix)
+{
+	return ((pix & 0xf0000000) >> 16) |
+	       ((pix & 0x00f00000) >> 12) |
+	       ((pix & 0x0000f000) >> 8) |
+	       ((pix & 0x000000f0) >> 4);
+}
+
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/8] drm/format-helper: Add generic conversion to 8-bit formats
  2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
                   ` (3 preceding siblings ...)
  2025-03-25 10:31 ` [PATCH 4/8] drm/format-helper: Add generic conversion to 16-bit formats Thomas Zimmermann
@ 2025-03-25 10:31 ` Thomas Zimmermann
  2025-03-26  8:53   ` Jocelyn Falempe
  2025-03-25 10:31 ` [PATCH 6/8] drm/format-helper: Optimize 32-to-24-bpp conversion Thomas Zimmermann
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

Add drm_fb_xfrm_line_32to8() to implement conversion from 32-bit
pixels to 8-bit pixels. The pixel-conversion is specified by the
given callback parameter. Mark the helper as always_inline to avoid
overhead from function calls.

Then implement all existing line-conversion functions with the new
generic call and the respective pixel-conversion helper.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_format_helper.c   | 38 ++++++++++-----------------
 drivers/gpu/drm/drm_format_internal.h | 17 ++++++++++++
 2 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index aecf55c1fc6b..a926aa6671fc 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -246,6 +246,18 @@ static int drm_fb_xfrm(struct iosys_map *dst,
 				     xfrm_line);
 }
 
+static __always_inline void drm_fb_xfrm_line_32to8(void *dbuf, const void *sbuf,
+						   unsigned int pixels,
+						   u32 (*xfrm_pixel)(u32))
+{
+	u8 *dbuf8 = dbuf;
+	const __le32 *sbuf32 = sbuf;
+	const __le32 *send32 = sbuf32 + pixels;
+
+	while (sbuf32 < send32)
+		*dbuf8++ = xfrm_pixel(le32_to_cpup(sbuf32++));
+}
+
 static __always_inline void drm_fb_xfrm_line_32to16(void *dbuf, const void *sbuf,
 						    unsigned int pixels,
 						    u32 (*xfrm_pixel)(u32))
@@ -411,17 +423,7 @@ EXPORT_SYMBOL(drm_fb_swab);
 
 static void drm_fb_xrgb8888_to_rgb332_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	u8 *dbuf8 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-	u32 pix;
-
-	for (x = 0; x < pixels; x++) {
-		pix = le32_to_cpu(sbuf32[x]);
-		dbuf8[x] = ((pix & 0x00e00000) >> 16) |
-			   ((pix & 0x0000e000) >> 11) |
-			   ((pix & 0x000000c0) >> 6);
-	}
+	drm_fb_xfrm_line_32to8(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_rgb332);
 }
 
 /**
@@ -879,19 +881,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb2101010);
 
 static void drm_fb_xrgb8888_to_gray8_line(void *dbuf, const void *sbuf, unsigned int pixels)
 {
-	u8 *dbuf8 = dbuf;
-	const __le32 *sbuf32 = sbuf;
-	unsigned int x;
-
-	for (x = 0; x < pixels; x++) {
-		u32 pix = le32_to_cpu(sbuf32[x]);
-		u8 r = (pix & 0x00ff0000) >> 16;
-		u8 g = (pix & 0x0000ff00) >> 8;
-		u8 b =  pix & 0x000000ff;
-
-		/* ITU BT.601: Y = 0.299 R + 0.587 G + 0.114 B */
-		*dbuf8++ = (3 * r + 6 * g + b) / 10;
-	}
+	drm_fb_xfrm_line_32to8(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_r8_bt601);
 }
 
 /**
diff --git a/drivers/gpu/drm/drm_format_internal.h b/drivers/gpu/drm/drm_format_internal.h
index 4f20b63cb6f6..9f857bfa368d 100644
--- a/drivers/gpu/drm/drm_format_internal.h
+++ b/drivers/gpu/drm/drm_format_internal.h
@@ -35,6 +35,23 @@
  * Conversions from XRGB8888
  */
 
+static inline u32 drm_pixel_xrgb8888_to_r8_bt601(u32 pix)
+{
+	u32 r = (pix & 0x00ff0000) >> 16;
+	u32 g = (pix & 0x0000ff00) >> 8;
+	u32 b =  pix & 0x000000ff;
+
+	/* ITU-R BT.601: Y = 0.299 R + 0.587 G + 0.114 B */
+	return (3 * r + 6 * g + b) / 10;
+}
+
+static inline u32 drm_pixel_xrgb8888_to_rgb332(u32 pix)
+{
+	return ((pix & 0x00e00000) >> 16) |
+	       ((pix & 0x0000e000) >> 11) |
+	       ((pix & 0x000000c0) >> 6);
+}
+
 static inline u32 drm_pixel_xrgb8888_to_rgb565(u32 pix)
 {
 	return ((pix & 0x00f80000) >> 8) |
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/8] drm/format-helper: Optimize 32-to-24-bpp conversion
  2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
                   ` (4 preceding siblings ...)
  2025-03-25 10:31 ` [PATCH 5/8] drm/format-helper: Add generic conversion to 8-bit formats Thomas Zimmermann
@ 2025-03-25 10:31 ` Thomas Zimmermann
  2025-03-26  8:55   ` Jocelyn Falempe
  2025-03-25 10:31 ` [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion Thomas Zimmermann
  2025-03-25 10:31 ` [PATCH 8/8] drm/format-helper: Optimize 32-to-8-bpp conversion Thomas Zimmermann
  7 siblings, 1 reply; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

For ease of implementation, existing line-conversion functions
for 24-bit formats write each byte individually. Optimize the
performance by writing 4 pixels in 3 32-bit stores.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_format_helper.c | 36 ++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index a926aa6671fc..b9c9c712aa9c 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -274,10 +274,44 @@ static __always_inline void drm_fb_xfrm_line_32to24(void *dbuf, const void *sbuf
 						    unsigned int pixels,
 						    u32 (*xfrm_pixel)(u32))
 {
-	u8 *dbuf8 = dbuf;
+	__le32 *dbuf32 = dbuf;
+	u8 *dbuf8;
 	const __le32 *sbuf32 = sbuf;
 	const __le32 *send32 = sbuf32 + pixels;
 
+	/* write pixels in chunks of 4 */
+	send32 -= pixels & GENMASK(1, 0);
+	while (sbuf32 < send32) {
+		u32 val24[4] = {
+			xfrm_pixel(le32_to_cpup(sbuf32++)),
+			xfrm_pixel(le32_to_cpup(sbuf32++)),
+			xfrm_pixel(le32_to_cpup(sbuf32++)),
+			xfrm_pixel(le32_to_cpup(sbuf32++)),
+		};
+		u32 out32[3] = {
+			/* write output bytes in reverse order for little endianness */
+			((val24[0] & 0x000000ff)) |
+			((val24[0] & 0x0000ff00)) |
+			((val24[0] & 0x00ff0000)) |
+			((val24[1] & 0x000000ff) << 24),
+			((val24[1] & 0x0000ff00) >> 8) |
+			((val24[1] & 0x00ff0000) >> 8) |
+			((val24[2] & 0x000000ff) << 16) |
+			((val24[2] & 0x0000ff00) << 16),
+			((val24[2] & 0x00ff0000) >> 16) |
+			((val24[3] & 0x000000ff) << 8) |
+			((val24[3] & 0x0000ff00) << 8) |
+			((val24[3] & 0x00ff0000) << 8),
+		};
+
+		*dbuf32++ = cpu_to_le32(out32[0]);
+		*dbuf32++ = cpu_to_le32(out32[1]);
+		*dbuf32++ = cpu_to_le32(out32[2]);
+	}
+	send32 += pixels & GENMASK(1, 0);
+
+	/* write trailing pixel */
+	dbuf8 = (u8 __force *)dbuf32;
 	while (sbuf32 < send32) {
 		u32 val24 = xfrm_pixel(le32_to_cpup(sbuf32++));
 		/* write output in reverse order for little endianness */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion
  2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
                   ` (5 preceding siblings ...)
  2025-03-25 10:31 ` [PATCH 6/8] drm/format-helper: Optimize 32-to-24-bpp conversion Thomas Zimmermann
@ 2025-03-25 10:31 ` Thomas Zimmermann
  2025-03-26  8:56   ` Jocelyn Falempe
  2025-03-26 10:53   ` Jani Nikula
  2025-03-25 10:31 ` [PATCH 8/8] drm/format-helper: Optimize 32-to-8-bpp conversion Thomas Zimmermann
  7 siblings, 2 replies; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

For ease of implementation, existing line-conversion functions
for 16-bit formats write each pixel individually. Optimize the
performance by writing mulitple pixels in single 64-bit and 32-bit
stores.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_format_helper.c | 40 ++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index b9c9c712aa9c..66137df85725 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -262,10 +262,48 @@ static __always_inline void drm_fb_xfrm_line_32to16(void *dbuf, const void *sbuf
 						    unsigned int pixels,
 						    u32 (*xfrm_pixel)(u32))
 {
-	__le16 *dbuf16 = dbuf;
+	__le64 *dbuf64 = dbuf;
+	__le32 *dbuf32;
+	__le16 *dbuf16;
 	const __le32 *sbuf32 = sbuf;
 	const __le32 *send32 = sbuf32 + pixels;
 
+#if defined(CONFIG_64BIT)
+	/* write 4 pixels at once */
+	send32 -= pixels & GENMASK(1, 0);
+	while (sbuf32 < send32) {
+		u32 pix[4] = {
+			le32_to_cpup(sbuf32++),
+			le32_to_cpup(sbuf32++),
+			le32_to_cpup(sbuf32++),
+			le32_to_cpup(sbuf32++),
+		};
+		/* write output bytes in reverse order for little endianness */
+		u64 val64 = ((u64)xfrm_pixel(pix[0])) |
+			    ((u64)xfrm_pixel(pix[1]) << 16) |
+			    ((u64)xfrm_pixel(pix[2]) << 32) |
+			    ((u64)xfrm_pixel(pix[3]) << 48);
+		*dbuf64++ = cpu_to_le64(val64);
+	}
+	send32 += pixels & GENMASK(1, 1);
+#endif
+
+	/* write 2 pixels at once */
+	dbuf32 = (__le32 __force *)dbuf64;
+	while (sbuf32 < send32) {
+		u32 pix[2] = {
+			le32_to_cpup(sbuf32++),
+			le32_to_cpup(sbuf32++),
+		};
+		/* write output bytes in reverse order for little endianness */
+		u32 val32 = xfrm_pixel(pix[0]) |
+			   (xfrm_pixel(pix[1]) << 16);
+		*dbuf32++ = cpu_to_le32(val32);
+	}
+	send32 += pixels & GENMASK(0, 0);
+
+	/* write trailing pixel */
+	dbuf16 = (__le16 __force *)dbuf32;
 	while (sbuf32 < send32)
 		*dbuf16++ = cpu_to_le16(xfrm_pixel(le32_to_cpup(sbuf32++)));
 }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/8] drm/format-helper: Optimize 32-to-8-bpp conversion
  2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
                   ` (6 preceding siblings ...)
  2025-03-25 10:31 ` [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion Thomas Zimmermann
@ 2025-03-25 10:31 ` Thomas Zimmermann
  2025-03-26  8:57   ` Jocelyn Falempe
  7 siblings, 1 reply; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-25 10:31 UTC (permalink / raw)
  To: jfalempe, simona, airlied, mripard, maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

For ease of implementation, existing line-conversion functions
for 8-bit formats write each pixel individually. Optimize the
performance by writing mulitple pixels in a single 32-bit store.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/drm_format_helper.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
index 66137df85725..73833db28c3c 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -250,10 +250,31 @@ static __always_inline void drm_fb_xfrm_line_32to8(void *dbuf, const void *sbuf,
 						   unsigned int pixels,
 						   u32 (*xfrm_pixel)(u32))
 {
-	u8 *dbuf8 = dbuf;
+	__le32 *dbuf32 = dbuf;
+	u8 *dbuf8;
 	const __le32 *sbuf32 = sbuf;
 	const __le32 *send32 = sbuf32 + pixels;
 
+	/* write 4 pixels at once */
+	send32 -= pixels & GENMASK(1, 0);
+	while (sbuf32 < send32) {
+		u32 pix[4] = {
+			le32_to_cpup(sbuf32++),
+			le32_to_cpup(sbuf32++),
+			le32_to_cpup(sbuf32++),
+			le32_to_cpup(sbuf32++),
+		};
+		/* write output bytes in reverse order for little endianness */
+		u32 val32 = xfrm_pixel(pix[0]) |
+			   (xfrm_pixel(pix[1]) << 8) |
+			   (xfrm_pixel(pix[2]) << 16) |
+			   (xfrm_pixel(pix[3]) << 24);
+		*dbuf32++ = cpu_to_le32(val32);
+	}
+	send32 += pixels & GENMASK(1, 0);
+
+	/* write trailing pixels */
+	dbuf8 = (u8 __force *)dbuf32;
 	while (sbuf32 < send32)
 		*dbuf8++ = xfrm_pixel(le32_to_cpup(sbuf32++));
 }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/8] drm/format-helper: Move helpers for pixel conversion to header file
  2025-03-25 10:31 ` [PATCH 1/8] drm/format-helper: Move helpers for pixel conversion to header file Thomas Zimmermann
@ 2025-03-26  8:51   ` Jocelyn Falempe
  0 siblings, 0 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2025-03-26  8:51 UTC (permalink / raw)
  To: Thomas Zimmermann, simona, airlied, mripard, maarten.lankhorst; +Cc: dri-devel

On 25/03/2025 11:31, Thomas Zimmermann wrote:
> The DRM draw helpers contain format-conversion helpers that operate
> on individual pixels. Move them into an internal header file and adopt
> them as individual API. Update the draw code accordingly. The pixel
> helpers will also be useful for other format conversion helpers.

Thanks, it looks good to me.

Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>

> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_draw.c            | 100 +++-------------------
>   drivers/gpu/drm/drm_format_internal.h | 119 ++++++++++++++++++++++++++
>   2 files changed, 130 insertions(+), 89 deletions(-)
>   create mode 100644 drivers/gpu/drm/drm_format_internal.h
> 
> diff --git a/drivers/gpu/drm/drm_draw.c b/drivers/gpu/drm/drm_draw.c
> index cb2ad12bce57..d41f8ae1c148 100644
> --- a/drivers/gpu/drm/drm_draw.c
> +++ b/drivers/gpu/drm/drm_draw.c
> @@ -11,85 +11,7 @@
>   #include <drm/drm_fourcc.h>
>   
>   #include "drm_draw_internal.h"
> -
> -/*
> - * Conversions from xrgb8888
> - */
> -
> -static u16 convert_xrgb8888_to_rgb565(u32 pix)
> -{
> -	return ((pix & 0x00F80000) >> 8) |
> -	       ((pix & 0x0000FC00) >> 5) |
> -	       ((pix & 0x000000F8) >> 3);
> -}
> -
> -static u16 convert_xrgb8888_to_rgba5551(u32 pix)
> -{
> -	return ((pix & 0x00f80000) >> 8) |
> -	       ((pix & 0x0000f800) >> 5) |
> -	       ((pix & 0x000000f8) >> 2) |
> -	       BIT(0); /* set alpha bit */
> -}
> -
> -static u16 convert_xrgb8888_to_xrgb1555(u32 pix)
> -{
> -	return ((pix & 0x00f80000) >> 9) |
> -	       ((pix & 0x0000f800) >> 6) |
> -	       ((pix & 0x000000f8) >> 3);
> -}
> -
> -static u16 convert_xrgb8888_to_argb1555(u32 pix)
> -{
> -	return BIT(15) | /* set alpha bit */
> -	       ((pix & 0x00f80000) >> 9) |
> -	       ((pix & 0x0000f800) >> 6) |
> -	       ((pix & 0x000000f8) >> 3);
> -}
> -
> -static u32 convert_xrgb8888_to_argb8888(u32 pix)
> -{
> -	return pix | GENMASK(31, 24); /* fill alpha bits */
> -}
> -
> -static u32 convert_xrgb8888_to_xbgr8888(u32 pix)
> -{
> -	return ((pix & 0x00ff0000) >> 16) <<  0 |
> -	       ((pix & 0x0000ff00) >>  8) <<  8 |
> -	       ((pix & 0x000000ff) >>  0) << 16 |
> -	       ((pix & 0xff000000) >> 24) << 24;
> -}
> -
> -static u32 convert_xrgb8888_to_abgr8888(u32 pix)
> -{
> -	return ((pix & 0x00ff0000) >> 16) <<  0 |
> -	       ((pix & 0x0000ff00) >>  8) <<  8 |
> -	       ((pix & 0x000000ff) >>  0) << 16 |
> -	       GENMASK(31, 24); /* fill alpha bits */
> -}
> -
> -static u32 convert_xrgb8888_to_xrgb2101010(u32 pix)
> -{
> -	pix = ((pix & 0x000000FF) << 2) |
> -	      ((pix & 0x0000FF00) << 4) |
> -	      ((pix & 0x00FF0000) << 6);
> -	return pix | ((pix >> 8) & 0x00300C03);
> -}
> -
> -static u32 convert_xrgb8888_to_argb2101010(u32 pix)
> -{
> -	pix = ((pix & 0x000000FF) << 2) |
> -	      ((pix & 0x0000FF00) << 4) |
> -	      ((pix & 0x00FF0000) << 6);
> -	return GENMASK(31, 30) /* set alpha bits */ | pix | ((pix >> 8) & 0x00300C03);
> -}
> -
> -static u32 convert_xrgb8888_to_abgr2101010(u32 pix)
> -{
> -	pix = ((pix & 0x00FF0000) >> 14) |
> -	      ((pix & 0x0000FF00) << 4) |
> -	      ((pix & 0x000000FF) << 22);
> -	return GENMASK(31, 30) /* set alpha bits */ | pix | ((pix >> 8) & 0x00300C03);
> -}
> +#include "drm_format_internal.h"
>   
>   /**
>    * drm_draw_color_from_xrgb8888 - convert one pixel from xrgb8888 to the desired format
> @@ -104,28 +26,28 @@ u32 drm_draw_color_from_xrgb8888(u32 color, u32 format)
>   {
>   	switch (format) {
>   	case DRM_FORMAT_RGB565:
> -		return convert_xrgb8888_to_rgb565(color);
> +		return drm_pixel_xrgb8888_to_rgb565(color);
>   	case DRM_FORMAT_RGBA5551:
> -		return convert_xrgb8888_to_rgba5551(color);
> +		return drm_pixel_xrgb8888_to_rgba5551(color);
>   	case DRM_FORMAT_XRGB1555:
> -		return convert_xrgb8888_to_xrgb1555(color);
> +		return drm_pixel_xrgb8888_to_xrgb1555(color);
>   	case DRM_FORMAT_ARGB1555:
> -		return convert_xrgb8888_to_argb1555(color);
> +		return drm_pixel_xrgb8888_to_argb1555(color);
>   	case DRM_FORMAT_RGB888:
>   	case DRM_FORMAT_XRGB8888:
>   		return color;
>   	case DRM_FORMAT_ARGB8888:
> -		return convert_xrgb8888_to_argb8888(color);
> +		return drm_pixel_xrgb8888_to_argb8888(color);
>   	case DRM_FORMAT_XBGR8888:
> -		return convert_xrgb8888_to_xbgr8888(color);
> +		return drm_pixel_xrgb8888_to_xbgr8888(color);
>   	case DRM_FORMAT_ABGR8888:
> -		return convert_xrgb8888_to_abgr8888(color);
> +		return drm_pixel_xrgb8888_to_abgr8888(color);
>   	case DRM_FORMAT_XRGB2101010:
> -		return convert_xrgb8888_to_xrgb2101010(color);
> +		return drm_pixel_xrgb8888_to_xrgb2101010(color);
>   	case DRM_FORMAT_ARGB2101010:
> -		return convert_xrgb8888_to_argb2101010(color);
> +		return drm_pixel_xrgb8888_to_argb2101010(color);
>   	case DRM_FORMAT_ABGR2101010:
> -		return convert_xrgb8888_to_abgr2101010(color);
> +		return drm_pixel_xrgb8888_to_abgr2101010(color);
>   	default:
>   		WARN_ONCE(1, "Can't convert to %p4cc\n", &format);
>   		return 0;
> diff --git a/drivers/gpu/drm/drm_format_internal.h b/drivers/gpu/drm/drm_format_internal.h
> new file mode 100644
> index 000000000000..5f82f0b9c8e8
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_format_internal.h
> @@ -0,0 +1,119 @@
> +/* SPDX-License-Identifier: GPL-2.0 or MIT */
> +
> +#ifndef DRM_FORMAT_INTERNAL_H
> +#define DRM_FORMAT_INTERNAL_H
> +
> +#include <linux/bits.h>
> +#include <linux/types.h>
> +
> +/*
> + * Each pixel-format conversion helper takes a raw pixel in a
> + * specific input format and returns a raw pixel in a specific
> + * output format. All pixels are in little-endian byte order.
> + *
> + * Function names are
> + *
> + *   drm_pixel_<input>_to_<output>_<algorithm>()
> + *
> + * where <input> and <output> refer to pixel formats. The
> + * <algorithm> is optional and hints to the method used for the
> + * conversion. Helpers with no algorithm given apply pixel-bit
> + * shifting.
> + *
> + * The argument type is u32. We expect this to be wide enough to
> + * hold all conversion input from 32-bit RGB to any output format.
> + * The Linux kernel should avoid format conversion for anything
> + * but XRGB8888 input data. Converting from other format can still
> + * be acceptable in some cases.
> + *
> + * The return type is u32. It is wide enough to hold all conversion
> + * output from XRGB8888. For output formats wider than 32 bit, a
> + * return type of u64 would be acceptable.
> + */
> +
> +/*
> + * Conversions from XRGB8888
> + */
> +
> +static inline u32 drm_pixel_xrgb8888_to_rgb565(u32 pix)
> +{
> +	return ((pix & 0x00f80000) >> 8) |
> +	       ((pix & 0x0000fc00) >> 5) |
> +	       ((pix & 0x000000f8) >> 3);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_rgbx5551(u32 pix)
> +{
> +	return ((pix & 0x00f80000) >> 8) |
> +	       ((pix & 0x0000f800) >> 5) |
> +	       ((pix & 0x000000f8) >> 2);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_rgba5551(u32 pix)
> +{
> +	return drm_pixel_xrgb8888_to_rgbx5551(pix) |
> +	       BIT(0); /* set alpha bit */
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_xrgb1555(u32 pix)
> +{
> +	return ((pix & 0x00f80000) >> 9) |
> +	       ((pix & 0x0000f800) >> 6) |
> +	       ((pix & 0x000000f8) >> 3);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_argb1555(u32 pix)
> +{
> +	return BIT(15) | /* set alpha bit */
> +	       drm_pixel_xrgb8888_to_xrgb1555(pix);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_argb8888(u32 pix)
> +{
> +	return GENMASK(31, 24) | /* fill alpha bits */
> +	       pix;
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_xbgr8888(u32 pix)
> +{
> +	return ((pix & 0xff000000)) | /* also copy filler bits */
> +	       ((pix & 0x00ff0000) >> 16) |
> +	       ((pix & 0x0000ff00)) |
> +	       ((pix & 0x000000ff) << 16);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_abgr8888(u32 pix)
> +{
> +	return GENMASK(31, 24) | /* fill alpha bits */
> +	       drm_pixel_xrgb8888_to_xbgr8888(pix);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_xrgb2101010(u32 pix)
> +{
> +	pix = ((pix & 0x000000ff) << 2) |
> +	      ((pix & 0x0000ff00) << 4) |
> +	      ((pix & 0x00ff0000) << 6);
> +	return pix | ((pix >> 8) & 0x00300c03);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_argb2101010(u32 pix)
> +{
> +	return GENMASK(31, 30) | /* set alpha bits */
> +	       drm_pixel_xrgb8888_to_xrgb2101010(pix);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_xbgr2101010(u32 pix)
> +{
> +	pix = ((pix & 0x00ff0000) >> 14) |
> +	      ((pix & 0x0000ff00) << 4) |
> +	      ((pix & 0x000000ff) << 22);
> +	return pix | ((pix >> 8) & 0x00300c03);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_abgr2101010(u32 pix)
> +{
> +	return GENMASK(31, 30) | /* set alpha bits */
> +	       drm_pixel_xrgb8888_to_xbgr2101010(pix);
> +}
> +
> +#endif


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/8] drm/format-helper: Add generic conversion to 32-bit formats
  2025-03-25 10:31 ` [PATCH 2/8] drm/format-helper: Add generic conversion to 32-bit formats Thomas Zimmermann
@ 2025-03-26  8:52   ` Jocelyn Falempe
  0 siblings, 0 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2025-03-26  8:52 UTC (permalink / raw)
  To: Thomas Zimmermann, simona, airlied, mripard, maarten.lankhorst; +Cc: dri-devel

On 25/03/2025 11:31, Thomas Zimmermann wrote:
> Add drm_fb_xfrm_line_32to32() to implement conversion from 32-bit
> pixels to 32-bit pixels. The pixel-conversion is specified by the
> given callback parameter. Mark the helper as always_inline to avoid
> overhead from function calls.
> 
> Then implement all existing line-conversion functions with the new
> generic call and the respective pixel-conversion helper.

Thanks, it looks good to me.

Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>

> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_format_helper.c | 84 +++++++----------------------
>   1 file changed, 19 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
> index 01d3ab307ac3..abd18c23cfbb 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -20,6 +20,8 @@
>   #include <drm/drm_print.h>
>   #include <drm/drm_rect.h>
>   
> +#include "drm_format_internal.h"
> +
>   /**
>    * drm_format_conv_state_init - Initialize format-conversion state
>    * @state: The state to initialize
> @@ -244,6 +246,18 @@ static int drm_fb_xfrm(struct iosys_map *dst,
>   				     xfrm_line);
>   }
>   
> +static __always_inline void drm_fb_xfrm_line_32to32(void *dbuf, const void *sbuf,
> +						    unsigned int pixels,
> +						    u32 (*xfrm_pixel)(u32))
> +{
> +	__le32 *dbuf32 = dbuf;
> +	const __le32 *sbuf32 = sbuf;
> +	const __le32 *send32 = sbuf32 + pixels;
> +
> +	while (sbuf32 < send32)
> +		*dbuf32++ = cpu_to_le32(xfrm_pixel(le32_to_cpup(sbuf32++)));
> +}
> +
>   /**
>    * drm_fb_memcpy - Copy clip buffer
>    * @dst: Array of destination buffers
> @@ -755,16 +769,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_bgr888);
>   
>   static void drm_fb_xrgb8888_to_argb8888_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le32 *dbuf32 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		pix |= GENMASK(31, 24); /* fill alpha bits */
> -		dbuf32[x] = cpu_to_le32(pix);
> -	}
> +	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_argb8888);
>   }
>   
>   /**
> @@ -804,19 +809,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb8888);
>   
>   static void drm_fb_xrgb8888_to_abgr8888_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le32 *dbuf32 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		pix = ((pix & 0x00ff0000) >> 16) <<  0 |
> -		      ((pix & 0x0000ff00) >>  8) <<  8 |
> -		      ((pix & 0x000000ff) >>  0) << 16 |
> -		      GENMASK(31, 24); /* fill alpha bits */
> -		*dbuf32++ = cpu_to_le32(pix);
> -	}
> +	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_abgr8888);
>   }
>   
>   static void drm_fb_xrgb8888_to_abgr8888(struct iosys_map *dst, const unsigned int *dst_pitch,
> @@ -835,19 +828,7 @@ static void drm_fb_xrgb8888_to_abgr8888(struct iosys_map *dst, const unsigned in
>   
>   static void drm_fb_xrgb8888_to_xbgr8888_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le32 *dbuf32 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		pix = ((pix & 0x00ff0000) >> 16) <<  0 |
> -		      ((pix & 0x0000ff00) >>  8) <<  8 |
> -		      ((pix & 0x000000ff) >>  0) << 16 |
> -		      ((pix & 0xff000000) >> 24) << 24;
> -		*dbuf32++ = cpu_to_le32(pix);
> -	}
> +	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_xbgr8888);
>   }
>   
>   static void drm_fb_xrgb8888_to_xbgr8888(struct iosys_map *dst, const unsigned int *dst_pitch,
> @@ -866,20 +847,7 @@ static void drm_fb_xrgb8888_to_xbgr8888(struct iosys_map *dst, const unsigned in
>   
>   static void drm_fb_xrgb8888_to_xrgb2101010_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le32 *dbuf32 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 val32;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		val32 = ((pix & 0x000000FF) << 2) |
> -			((pix & 0x0000FF00) << 4) |
> -			((pix & 0x00FF0000) << 6);
> -		pix = val32 | ((val32 >> 8) & 0x00300C03);
> -		*dbuf32++ = cpu_to_le32(pix);
> -	}
> +	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_xrgb2101010);
>   }
>   
>   /**
> @@ -920,21 +888,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_xrgb2101010);
>   
>   static void drm_fb_xrgb8888_to_argb2101010_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le32 *dbuf32 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 val32;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		val32 = ((pix & 0x000000ff) << 2) |
> -			((pix & 0x0000ff00) << 4) |
> -			((pix & 0x00ff0000) << 6);
> -		pix = GENMASK(31, 30) | /* set alpha bits */
> -		      val32 | ((val32 >> 8) & 0x00300c03);
> -		*dbuf32++ = cpu_to_le32(pix);
> -	}
> +	drm_fb_xfrm_line_32to32(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_argb2101010);
>   }
>   
>   /**


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/8] drm/format-helper: Add generic conversion to 24-bit formats
  2025-03-25 10:31 ` [PATCH 3/8] drm/format-helper: Add generic conversion to 24-bit formats Thomas Zimmermann
@ 2025-03-26  8:52   ` Jocelyn Falempe
  0 siblings, 0 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2025-03-26  8:52 UTC (permalink / raw)
  To: Thomas Zimmermann, simona, airlied, mripard, maarten.lankhorst; +Cc: dri-devel

On 25/03/2025 11:31, Thomas Zimmermann wrote:
> Add drm_fb_xfrm_line_32to24() to implement conversion from 32-bit
> pixels to 24-bit pixels. The pixel-conversion is specified by the
> given callback parameter. Mark the helper as always_inline to avoid
> overhead from function calls.
> 
> Then implement all existing line-conversion functions with the new
> generic call and the respective pixel-conversion helper.

Thanks, it looks good to me.

Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>

> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_format_helper.c   | 43 ++++++++++++---------------
>   drivers/gpu/drm/drm_format_internal.h | 12 ++++++++
>   2 files changed, 31 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
> index abd18c23cfbb..5a2fe3d685a2 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -246,6 +246,23 @@ static int drm_fb_xfrm(struct iosys_map *dst,
>   				     xfrm_line);
>   }
>   
> +static __always_inline void drm_fb_xfrm_line_32to24(void *dbuf, const void *sbuf,
> +						    unsigned int pixels,
> +						    u32 (*xfrm_pixel)(u32))
> +{
> +	u8 *dbuf8 = dbuf;
> +	const __le32 *sbuf32 = sbuf;
> +	const __le32 *send32 = sbuf32 + pixels;
> +
> +	while (sbuf32 < send32) {
> +		u32 val24 = xfrm_pixel(le32_to_cpup(sbuf32++));
> +		/* write output in reverse order for little endianness */
> +		*dbuf8++ = (val24 & 0x000000ff);
> +		*dbuf8++ = (val24 & 0x0000ff00) >>  8;
> +		*dbuf8++ = (val24 & 0x00ff0000) >> 16;
> +	}
> +}
> +
>   static __always_inline void drm_fb_xfrm_line_32to32(void *dbuf, const void *sbuf,
>   						    unsigned int pixels,
>   						    u32 (*xfrm_pixel)(u32))
> @@ -667,18 +684,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgba5551);
>   
>   static void drm_fb_xrgb8888_to_rgb888_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	u8 *dbuf8 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		/* write blue-green-red to output in little endianness */
> -		*dbuf8++ = (pix & 0x000000FF) >>  0;
> -		*dbuf8++ = (pix & 0x0000FF00) >>  8;
> -		*dbuf8++ = (pix & 0x00FF0000) >> 16;
> -	}
> +	drm_fb_xfrm_line_32to24(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_rgb888);
>   }
>   
>   /**
> @@ -718,18 +724,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb888);
>   
>   static void drm_fb_xrgb8888_to_bgr888_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	u8 *dbuf8 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		/* write red-green-blue to output in little endianness */
> -		*dbuf8++ = (pix & 0x00ff0000) >> 16;
> -		*dbuf8++ = (pix & 0x0000ff00) >> 8;
> -		*dbuf8++ = (pix & 0x000000ff) >> 0;
> -	}
> +	drm_fb_xfrm_line_32to24(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_bgr888);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/drm_format_internal.h b/drivers/gpu/drm/drm_format_internal.h
> index 5f82f0b9c8e8..e7fcf260a914 100644
> --- a/drivers/gpu/drm/drm_format_internal.h
> +++ b/drivers/gpu/drm/drm_format_internal.h
> @@ -68,6 +68,18 @@ static inline u32 drm_pixel_xrgb8888_to_argb1555(u32 pix)
>   	       drm_pixel_xrgb8888_to_xrgb1555(pix);
>   }
>   
> +static inline u32 drm_pixel_xrgb8888_to_rgb888(u32 pix)
> +{
> +	return pix & GENMASK(23, 0);
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_bgr888(u32 pix)
> +{
> +	return ((pix & 0x00ff0000) >> 16) |
> +	       ((pix & 0x0000ff00)) |
> +	       ((pix & 0x000000ff) << 16);
> +}
> +
>   static inline u32 drm_pixel_xrgb8888_to_argb8888(u32 pix)
>   {
>   	return GENMASK(31, 24) | /* fill alpha bits */


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/8] drm/format-helper: Add generic conversion to 16-bit formats
  2025-03-25 10:31 ` [PATCH 4/8] drm/format-helper: Add generic conversion to 16-bit formats Thomas Zimmermann
@ 2025-03-26  8:53   ` Jocelyn Falempe
  0 siblings, 0 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2025-03-26  8:53 UTC (permalink / raw)
  To: Thomas Zimmermann, simona, airlied, mripard, maarten.lankhorst; +Cc: dri-devel

On 25/03/2025 11:31, Thomas Zimmermann wrote:
> Add drm_fb_xfrm_line_32to16() to implement conversion from 32-bit
> pixels to 16-bit pixels. The pixel-conversion is specified by the
> given callback parameter. Mark the helper as always_inline to avoid
> overhead from function calls.
> 
> Then implement all existing line-conversion functions with the new
> generic call and the respective pixel-conversion helper. There's one
> pixel-conversion helper that swaps output bytes. It is for gud and
> requires refactoring, so don't move it into the header file.
> 

Thanks, it looks good to me.

Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>

> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_format_helper.c   | 118 +++++---------------------
>   drivers/gpu/drm/drm_format_internal.h |  12 +++
>   2 files changed, 34 insertions(+), 96 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
> index 5a2fe3d685a2..aecf55c1fc6b 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -246,6 +246,18 @@ static int drm_fb_xfrm(struct iosys_map *dst,
>   				     xfrm_line);
>   }
>   
> +static __always_inline void drm_fb_xfrm_line_32to16(void *dbuf, const void *sbuf,
> +						    unsigned int pixels,
> +						    u32 (*xfrm_pixel)(u32))
> +{
> +	__le16 *dbuf16 = dbuf;
> +	const __le32 *sbuf32 = sbuf;
> +	const __le32 *send32 = sbuf32 + pixels;
> +
> +	while (sbuf32 < send32)
> +		*dbuf16++ = cpu_to_le16(xfrm_pixel(le32_to_cpup(sbuf32++)));
> +}
> +
>   static __always_inline void drm_fb_xfrm_line_32to24(void *dbuf, const void *sbuf,
>   						    unsigned int pixels,
>   						    u32 (*xfrm_pixel)(u32))
> @@ -448,38 +460,19 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb332);
>   
>   static void drm_fb_xrgb8888_to_rgb565_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le16 *dbuf16 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u16 val16;
> -	u32 pix;
> +	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_rgb565);
> +}
>   
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		val16 = ((pix & 0x00F80000) >> 8) |
> -			((pix & 0x0000FC00) >> 5) |
> -			((pix & 0x000000F8) >> 3);
> -		dbuf16[x] = cpu_to_le16(val16);
> -	}
> +static __always_inline u32 drm_xrgb8888_to_rgb565_swab(u32 pix)
> +{
> +	return swab16(drm_pixel_xrgb8888_to_rgb565(pix));
>   }
>   
>   /* TODO: implement this helper as conversion to RGB565|BIG_ENDIAN */
>   static void drm_fb_xrgb8888_to_rgb565_swab_line(void *dbuf, const void *sbuf,
>   						unsigned int pixels)
>   {
> -	__le16 *dbuf16 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u16 val16;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		val16 = ((pix & 0x00F80000) >> 8) |
> -			((pix & 0x0000FC00) >> 5) |
> -			((pix & 0x000000F8) >> 3);
> -		dbuf16[x] = cpu_to_le16(swab16(val16));
> -	}
> +	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_xrgb8888_to_rgb565_swab);
>   }
>   
>   /**
> @@ -526,19 +519,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_rgb565);
>   
>   static void drm_fb_xrgb8888_to_xrgb1555_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le16 *dbuf16 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u16 val16;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		val16 = ((pix & 0x00f80000) >> 9) |
> -			((pix & 0x0000f800) >> 6) |
> -			((pix & 0x000000f8) >> 3);
> -		dbuf16[x] = cpu_to_le16(val16);
> -	}
> +	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_xrgb1555);
>   }
>   
>   /**
> @@ -578,20 +559,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_xrgb1555);
>   
>   static void drm_fb_xrgb8888_to_argb1555_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le16 *dbuf16 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u16 val16;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		val16 = BIT(15) | /* set alpha bit */
> -			((pix & 0x00f80000) >> 9) |
> -			((pix & 0x0000f800) >> 6) |
> -			((pix & 0x000000f8) >> 3);
> -		dbuf16[x] = cpu_to_le16(val16);
> -	}
> +	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_argb1555);
>   }
>   
>   /**
> @@ -631,20 +599,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb1555);
>   
>   static void drm_fb_xrgb8888_to_rgba5551_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	__le16 *dbuf16 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u16 val16;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		val16 = ((pix & 0x00f80000) >> 8) |
> -			((pix & 0x0000f800) >> 5) |
> -			((pix & 0x000000f8) >> 2) |
> -			BIT(0); /* set alpha bit */
> -		dbuf16[x] = cpu_to_le16(val16);
> -	}
> +	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_rgba5551);
>   }
>   
>   /**
> @@ -980,36 +935,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_gray8);
>   
>   static void drm_fb_argb8888_to_argb4444_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	unsigned int pixels2 = pixels & ~GENMASK_ULL(0, 0);
> -	__le32 *dbuf32 = dbuf;
> -	__le16 *dbuf16 = dbuf + pixels2 * sizeof(*dbuf16);
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 val32;
> -	u16 val16;
> -	u32 pix[2];
> -
> -	for (x = 0; x < pixels2; x += 2, ++dbuf32) {
> -		pix[0] = le32_to_cpu(sbuf32[x]);
> -		pix[1] = le32_to_cpu(sbuf32[x + 1]);
> -		val32 = ((pix[0] & 0xf0000000) >> 16) |
> -			((pix[0] & 0x00f00000) >> 12) |
> -			((pix[0] & 0x0000f000) >> 8) |
> -			((pix[0] & 0x000000f0) >> 4) |
> -			((pix[1] & 0xf0000000) >> 0) |
> -			((pix[1] & 0x00f00000) << 4) |
> -			((pix[1] & 0x0000f000) << 8) |
> -			((pix[1] & 0x000000f0) << 12);
> -		*dbuf32 = cpu_to_le32(val32);
> -	}
> -	for (; x < pixels; x++) {
> -		pix[0] = le32_to_cpu(sbuf32[x]);
> -		val16 = ((pix[0] & 0xf0000000) >> 16) |
> -			((pix[0] & 0x00f00000) >> 12) |
> -			((pix[0] & 0x0000f000) >> 8) |
> -			((pix[0] & 0x000000f0) >> 4);
> -		dbuf16[x] = cpu_to_le16(val16);
> -	}
> +	drm_fb_xfrm_line_32to16(dbuf, sbuf, pixels, drm_pixel_argb8888_to_argb4444);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/drm_format_internal.h b/drivers/gpu/drm/drm_format_internal.h
> index e7fcf260a914..4f20b63cb6f6 100644
> --- a/drivers/gpu/drm/drm_format_internal.h
> +++ b/drivers/gpu/drm/drm_format_internal.h
> @@ -128,4 +128,16 @@ static inline u32 drm_pixel_xrgb8888_to_abgr2101010(u32 pix)
>   	       drm_pixel_xrgb8888_to_xbgr2101010(pix);
>   }
>   
> +/*
> + * Conversion from ARGB8888
> + */
> +
> +static inline u32 drm_pixel_argb8888_to_argb4444(u32 pix)
> +{
> +	return ((pix & 0xf0000000) >> 16) |
> +	       ((pix & 0x00f00000) >> 12) |
> +	       ((pix & 0x0000f000) >> 8) |
> +	       ((pix & 0x000000f0) >> 4);
> +}
> +
>   #endif


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/8] drm/format-helper: Add generic conversion to 8-bit formats
  2025-03-25 10:31 ` [PATCH 5/8] drm/format-helper: Add generic conversion to 8-bit formats Thomas Zimmermann
@ 2025-03-26  8:53   ` Jocelyn Falempe
  0 siblings, 0 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2025-03-26  8:53 UTC (permalink / raw)
  To: Thomas Zimmermann, simona, airlied, mripard, maarten.lankhorst; +Cc: dri-devel

On 25/03/2025 11:31, Thomas Zimmermann wrote:
> Add drm_fb_xfrm_line_32to8() to implement conversion from 32-bit
> pixels to 8-bit pixels. The pixel-conversion is specified by the
> given callback parameter. Mark the helper as always_inline to avoid
> overhead from function calls.
> 
> Then implement all existing line-conversion functions with the new
> generic call and the respective pixel-conversion helper.

Thanks, it looks good to me.

Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>

> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_format_helper.c   | 38 ++++++++++-----------------
>   drivers/gpu/drm/drm_format_internal.h | 17 ++++++++++++
>   2 files changed, 31 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
> index aecf55c1fc6b..a926aa6671fc 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -246,6 +246,18 @@ static int drm_fb_xfrm(struct iosys_map *dst,
>   				     xfrm_line);
>   }
>   
> +static __always_inline void drm_fb_xfrm_line_32to8(void *dbuf, const void *sbuf,
> +						   unsigned int pixels,
> +						   u32 (*xfrm_pixel)(u32))
> +{
> +	u8 *dbuf8 = dbuf;
> +	const __le32 *sbuf32 = sbuf;
> +	const __le32 *send32 = sbuf32 + pixels;
> +
> +	while (sbuf32 < send32)
> +		*dbuf8++ = xfrm_pixel(le32_to_cpup(sbuf32++));
> +}
> +
>   static __always_inline void drm_fb_xfrm_line_32to16(void *dbuf, const void *sbuf,
>   						    unsigned int pixels,
>   						    u32 (*xfrm_pixel)(u32))
> @@ -411,17 +423,7 @@ EXPORT_SYMBOL(drm_fb_swab);
>   
>   static void drm_fb_xrgb8888_to_rgb332_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	u8 *dbuf8 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -	u32 pix;
> -
> -	for (x = 0; x < pixels; x++) {
> -		pix = le32_to_cpu(sbuf32[x]);
> -		dbuf8[x] = ((pix & 0x00e00000) >> 16) |
> -			   ((pix & 0x0000e000) >> 11) |
> -			   ((pix & 0x000000c0) >> 6);
> -	}
> +	drm_fb_xfrm_line_32to8(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_rgb332);
>   }
>   
>   /**
> @@ -879,19 +881,7 @@ EXPORT_SYMBOL(drm_fb_xrgb8888_to_argb2101010);
>   
>   static void drm_fb_xrgb8888_to_gray8_line(void *dbuf, const void *sbuf, unsigned int pixels)
>   {
> -	u8 *dbuf8 = dbuf;
> -	const __le32 *sbuf32 = sbuf;
> -	unsigned int x;
> -
> -	for (x = 0; x < pixels; x++) {
> -		u32 pix = le32_to_cpu(sbuf32[x]);
> -		u8 r = (pix & 0x00ff0000) >> 16;
> -		u8 g = (pix & 0x0000ff00) >> 8;
> -		u8 b =  pix & 0x000000ff;
> -
> -		/* ITU BT.601: Y = 0.299 R + 0.587 G + 0.114 B */
> -		*dbuf8++ = (3 * r + 6 * g + b) / 10;
> -	}
> +	drm_fb_xfrm_line_32to8(dbuf, sbuf, pixels, drm_pixel_xrgb8888_to_r8_bt601);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/drm_format_internal.h b/drivers/gpu/drm/drm_format_internal.h
> index 4f20b63cb6f6..9f857bfa368d 100644
> --- a/drivers/gpu/drm/drm_format_internal.h
> +++ b/drivers/gpu/drm/drm_format_internal.h
> @@ -35,6 +35,23 @@
>    * Conversions from XRGB8888
>    */
>   
> +static inline u32 drm_pixel_xrgb8888_to_r8_bt601(u32 pix)
> +{
> +	u32 r = (pix & 0x00ff0000) >> 16;
> +	u32 g = (pix & 0x0000ff00) >> 8;
> +	u32 b =  pix & 0x000000ff;
> +
> +	/* ITU-R BT.601: Y = 0.299 R + 0.587 G + 0.114 B */
> +	return (3 * r + 6 * g + b) / 10;
> +}
> +
> +static inline u32 drm_pixel_xrgb8888_to_rgb332(u32 pix)
> +{
> +	return ((pix & 0x00e00000) >> 16) |
> +	       ((pix & 0x0000e000) >> 11) |
> +	       ((pix & 0x000000c0) >> 6);
> +}
> +
>   static inline u32 drm_pixel_xrgb8888_to_rgb565(u32 pix)
>   {
>   	return ((pix & 0x00f80000) >> 8) |


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] drm/format-helper: Optimize 32-to-24-bpp conversion
  2025-03-25 10:31 ` [PATCH 6/8] drm/format-helper: Optimize 32-to-24-bpp conversion Thomas Zimmermann
@ 2025-03-26  8:55   ` Jocelyn Falempe
  0 siblings, 0 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2025-03-26  8:55 UTC (permalink / raw)
  To: Thomas Zimmermann, simona, airlied, mripard, maarten.lankhorst; +Cc: dri-devel

On 25/03/2025 11:31, Thomas Zimmermann wrote:
> For ease of implementation, existing line-conversion functions
> for 24-bit formats write each byte individually. Optimize the
> performance by writing 4 pixels in 3 32-bit stores.
> 
Thanks, it looks good to me.

Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>

> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_format_helper.c | 36 ++++++++++++++++++++++++++++-
>   1 file changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
> index a926aa6671fc..b9c9c712aa9c 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -274,10 +274,44 @@ static __always_inline void drm_fb_xfrm_line_32to24(void *dbuf, const void *sbuf
>   						    unsigned int pixels,
>   						    u32 (*xfrm_pixel)(u32))
>   {
> -	u8 *dbuf8 = dbuf;
> +	__le32 *dbuf32 = dbuf;
> +	u8 *dbuf8;
>   	const __le32 *sbuf32 = sbuf;
>   	const __le32 *send32 = sbuf32 + pixels;
>   
> +	/* write pixels in chunks of 4 */
> +	send32 -= pixels & GENMASK(1, 0);
> +	while (sbuf32 < send32) {
> +		u32 val24[4] = {
> +			xfrm_pixel(le32_to_cpup(sbuf32++)),
> +			xfrm_pixel(le32_to_cpup(sbuf32++)),
> +			xfrm_pixel(le32_to_cpup(sbuf32++)),
> +			xfrm_pixel(le32_to_cpup(sbuf32++)),
> +		};
> +		u32 out32[3] = {
> +			/* write output bytes in reverse order for little endianness */
> +			((val24[0] & 0x000000ff)) |
> +			((val24[0] & 0x0000ff00)) |
> +			((val24[0] & 0x00ff0000)) |
> +			((val24[1] & 0x000000ff) << 24),
> +			((val24[1] & 0x0000ff00) >> 8) |
> +			((val24[1] & 0x00ff0000) >> 8) |
> +			((val24[2] & 0x000000ff) << 16) |
> +			((val24[2] & 0x0000ff00) << 16),
> +			((val24[2] & 0x00ff0000) >> 16) |
> +			((val24[3] & 0x000000ff) << 8) |
> +			((val24[3] & 0x0000ff00) << 8) |
> +			((val24[3] & 0x00ff0000) << 8),
> +		};
> +
> +		*dbuf32++ = cpu_to_le32(out32[0]);
> +		*dbuf32++ = cpu_to_le32(out32[1]);
> +		*dbuf32++ = cpu_to_le32(out32[2]);
> +	}
> +	send32 += pixels & GENMASK(1, 0);
> +
> +	/* write trailing pixel */
> +	dbuf8 = (u8 __force *)dbuf32;
>   	while (sbuf32 < send32) {
>   		u32 val24 = xfrm_pixel(le32_to_cpup(sbuf32++));
>   		/* write output in reverse order for little endianness */


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion
  2025-03-25 10:31 ` [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion Thomas Zimmermann
@ 2025-03-26  8:56   ` Jocelyn Falempe
  2025-03-26 10:53   ` Jani Nikula
  1 sibling, 0 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2025-03-26  8:56 UTC (permalink / raw)
  To: Thomas Zimmermann, simona, airlied, mripard, maarten.lankhorst; +Cc: dri-devel

On 25/03/2025 11:31, Thomas Zimmermann wrote:
> For ease of implementation, existing line-conversion functions
> for 16-bit formats write each pixel individually. Optimize the
> performance by writing mulitple pixels in single 64-bit and 32-bit
> stores.

With the commit message typo fixed, mulitple => multiple

Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_format_helper.c | 40 ++++++++++++++++++++++++++++-
>   1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
> index b9c9c712aa9c..66137df85725 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -262,10 +262,48 @@ static __always_inline void drm_fb_xfrm_line_32to16(void *dbuf, const void *sbuf
>   						    unsigned int pixels,
>   						    u32 (*xfrm_pixel)(u32))
>   {
> -	__le16 *dbuf16 = dbuf;
> +	__le64 *dbuf64 = dbuf;
> +	__le32 *dbuf32;
> +	__le16 *dbuf16;
>   	const __le32 *sbuf32 = sbuf;
>   	const __le32 *send32 = sbuf32 + pixels;
>   
> +#if defined(CONFIG_64BIT)
> +	/* write 4 pixels at once */
> +	send32 -= pixels & GENMASK(1, 0);
> +	while (sbuf32 < send32) {
> +		u32 pix[4] = {
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +		};
> +		/* write output bytes in reverse order for little endianness */
> +		u64 val64 = ((u64)xfrm_pixel(pix[0])) |
> +			    ((u64)xfrm_pixel(pix[1]) << 16) |
> +			    ((u64)xfrm_pixel(pix[2]) << 32) |
> +			    ((u64)xfrm_pixel(pix[3]) << 48);
> +		*dbuf64++ = cpu_to_le64(val64);
> +	}
> +	send32 += pixels & GENMASK(1, 1);
> +#endif
> +
> +	/* write 2 pixels at once */
> +	dbuf32 = (__le32 __force *)dbuf64;
> +	while (sbuf32 < send32) {
> +		u32 pix[2] = {
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +		};
> +		/* write output bytes in reverse order for little endianness */
> +		u32 val32 = xfrm_pixel(pix[0]) |
> +			   (xfrm_pixel(pix[1]) << 16);
> +		*dbuf32++ = cpu_to_le32(val32);
> +	}
> +	send32 += pixels & GENMASK(0, 0);
> +
> +	/* write trailing pixel */
> +	dbuf16 = (__le16 __force *)dbuf32;
>   	while (sbuf32 < send32)
>   		*dbuf16++ = cpu_to_le16(xfrm_pixel(le32_to_cpup(sbuf32++)));
>   }


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/8] drm/format-helper: Optimize 32-to-8-bpp conversion
  2025-03-25 10:31 ` [PATCH 8/8] drm/format-helper: Optimize 32-to-8-bpp conversion Thomas Zimmermann
@ 2025-03-26  8:57   ` Jocelyn Falempe
  0 siblings, 0 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2025-03-26  8:57 UTC (permalink / raw)
  To: Thomas Zimmermann, simona, airlied, mripard, maarten.lankhorst; +Cc: dri-devel

On 25/03/2025 11:31, Thomas Zimmermann wrote:
> For ease of implementation, existing line-conversion functions
> for 8-bit formats write each pixel individually. Optimize the
> performance by writing mulitple pixels in a single 32-bit store.

With the commit message typo fixed, mulitple => multiple

Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>   drivers/gpu/drm/drm_format_helper.c | 23 ++++++++++++++++++++++-
>   1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
> index 66137df85725..73833db28c3c 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -250,10 +250,31 @@ static __always_inline void drm_fb_xfrm_line_32to8(void *dbuf, const void *sbuf,
>   						   unsigned int pixels,
>   						   u32 (*xfrm_pixel)(u32))
>   {
> -	u8 *dbuf8 = dbuf;
> +	__le32 *dbuf32 = dbuf;
> +	u8 *dbuf8;
>   	const __le32 *sbuf32 = sbuf;
>   	const __le32 *send32 = sbuf32 + pixels;
>   
> +	/* write 4 pixels at once */
> +	send32 -= pixels & GENMASK(1, 0);
> +	while (sbuf32 < send32) {
> +		u32 pix[4] = {
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +		};
> +		/* write output bytes in reverse order for little endianness */
> +		u32 val32 = xfrm_pixel(pix[0]) |
> +			   (xfrm_pixel(pix[1]) << 8) |
> +			   (xfrm_pixel(pix[2]) << 16) |
> +			   (xfrm_pixel(pix[3]) << 24);
> +		*dbuf32++ = cpu_to_le32(val32);
> +	}
> +	send32 += pixels & GENMASK(1, 0);
> +
> +	/* write trailing pixels */
> +	dbuf8 = (u8 __force *)dbuf32;
>   	while (sbuf32 < send32)
>   		*dbuf8++ = xfrm_pixel(le32_to_cpup(sbuf32++));
>   }


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion
  2025-03-25 10:31 ` [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion Thomas Zimmermann
  2025-03-26  8:56   ` Jocelyn Falempe
@ 2025-03-26 10:53   ` Jani Nikula
  2025-03-26 12:36     ` Thomas Zimmermann
  1 sibling, 1 reply; 19+ messages in thread
From: Jani Nikula @ 2025-03-26 10:53 UTC (permalink / raw)
  To: Thomas Zimmermann, jfalempe, simona, airlied, mripard,
	maarten.lankhorst
  Cc: dri-devel, Thomas Zimmermann

On Tue, 25 Mar 2025, Thomas Zimmermann <tzimmermann@suse.de> wrote:
> For ease of implementation, existing line-conversion functions
> for 16-bit formats write each pixel individually. Optimize the
> performance by writing mulitple pixels in single 64-bit and 32-bit
> stores.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>  drivers/gpu/drm/drm_format_helper.c | 40 ++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
> index b9c9c712aa9c..66137df85725 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -262,10 +262,48 @@ static __always_inline void drm_fb_xfrm_line_32to16(void *dbuf, const void *sbuf
>  						    unsigned int pixels,
>  						    u32 (*xfrm_pixel)(u32))
>  {
> -	__le16 *dbuf16 = dbuf;
> +	__le64 *dbuf64 = dbuf;
> +	__le32 *dbuf32;
> +	__le16 *dbuf16;
>  	const __le32 *sbuf32 = sbuf;
>  	const __le32 *send32 = sbuf32 + pixels;
>  
> +#if defined(CONFIG_64BIT)
> +	/* write 4 pixels at once */
> +	send32 -= pixels & GENMASK(1, 0);
> +	while (sbuf32 < send32) {

I find the adjusting of send32 before and after the loop with different
masks a bit confusing. Would it not suffice to:

	while (sbuf32 < ALIGN_DOWN(send32, 4))

and leave send32 untouched? With different alignments for 2 pixels at a
time.


BR,
Jani.


> +		u32 pix[4] = {
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +		};
> +		/* write output bytes in reverse order for little endianness */
> +		u64 val64 = ((u64)xfrm_pixel(pix[0])) |
> +			    ((u64)xfrm_pixel(pix[1]) << 16) |
> +			    ((u64)xfrm_pixel(pix[2]) << 32) |
> +			    ((u64)xfrm_pixel(pix[3]) << 48);
> +		*dbuf64++ = cpu_to_le64(val64);
> +	}
> +	send32 += pixels & GENMASK(1, 1);
> +#endif
> +
> +	/* write 2 pixels at once */
> +	dbuf32 = (__le32 __force *)dbuf64;
> +	while (sbuf32 < send32) {
> +		u32 pix[2] = {
> +			le32_to_cpup(sbuf32++),
> +			le32_to_cpup(sbuf32++),
> +		};
> +		/* write output bytes in reverse order for little endianness */
> +		u32 val32 = xfrm_pixel(pix[0]) |
> +			   (xfrm_pixel(pix[1]) << 16);
> +		*dbuf32++ = cpu_to_le32(val32);
> +	}
> +	send32 += pixels & GENMASK(0, 0);
> +
> +	/* write trailing pixel */
> +	dbuf16 = (__le16 __force *)dbuf32;
>  	while (sbuf32 < send32)
>  		*dbuf16++ = cpu_to_le16(xfrm_pixel(le32_to_cpup(sbuf32++)));
>  }

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion
  2025-03-26 10:53   ` Jani Nikula
@ 2025-03-26 12:36     ` Thomas Zimmermann
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Zimmermann @ 2025-03-26 12:36 UTC (permalink / raw)
  To: Jani Nikula, jfalempe, simona, airlied, mripard,
	maarten.lankhorst
  Cc: dri-devel

Hi

Am 26.03.25 um 11:53 schrieb Jani Nikula:
> On Tue, 25 Mar 2025, Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> For ease of implementation, existing line-conversion functions
>> for 16-bit formats write each pixel individually. Optimize the
>> performance by writing mulitple pixels in single 64-bit and 32-bit
>> stores.
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> ---
>>   drivers/gpu/drm/drm_format_helper.c | 40 ++++++++++++++++++++++++++++-
>>   1 file changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_format_helper.c b/drivers/gpu/drm/drm_format_helper.c
>> index b9c9c712aa9c..66137df85725 100644
>> --- a/drivers/gpu/drm/drm_format_helper.c
>> +++ b/drivers/gpu/drm/drm_format_helper.c
>> @@ -262,10 +262,48 @@ static __always_inline void drm_fb_xfrm_line_32to16(void *dbuf, const void *sbuf
>>   						    unsigned int pixels,
>>   						    u32 (*xfrm_pixel)(u32))
>>   {
>> -	__le16 *dbuf16 = dbuf;
>> +	__le64 *dbuf64 = dbuf;
>> +	__le32 *dbuf32;
>> +	__le16 *dbuf16;
>>   	const __le32 *sbuf32 = sbuf;
>>   	const __le32 *send32 = sbuf32 + pixels;
>>   
>> +#if defined(CONFIG_64BIT)
>> +	/* write 4 pixels at once */
>> +	send32 -= pixels & GENMASK(1, 0);
>> +	while (sbuf32 < send32) {
> I find the adjusting of send32 before and after the loop with different
> masks a bit confusing. Would it not suffice to:
>
> 	while (sbuf32 < ALIGN_DOWN(send32, 4))
>
> and leave send32 untouched? With different alignments for 2 pixels at a
> time.

Makes sense.

Best regards
Thomas

>
>
> BR,
> Jani.
>
>
>> +		u32 pix[4] = {
>> +			le32_to_cpup(sbuf32++),
>> +			le32_to_cpup(sbuf32++),
>> +			le32_to_cpup(sbuf32++),
>> +			le32_to_cpup(sbuf32++),
>> +		};
>> +		/* write output bytes in reverse order for little endianness */
>> +		u64 val64 = ((u64)xfrm_pixel(pix[0])) |
>> +			    ((u64)xfrm_pixel(pix[1]) << 16) |
>> +			    ((u64)xfrm_pixel(pix[2]) << 32) |
>> +			    ((u64)xfrm_pixel(pix[3]) << 48);
>> +		*dbuf64++ = cpu_to_le64(val64);
>> +	}
>> +	send32 += pixels & GENMASK(1, 1);
>> +#endif
>> +
>> +	/* write 2 pixels at once */
>> +	dbuf32 = (__le32 __force *)dbuf64;
>> +	while (sbuf32 < send32) {
>> +		u32 pix[2] = {
>> +			le32_to_cpup(sbuf32++),
>> +			le32_to_cpup(sbuf32++),
>> +		};
>> +		/* write output bytes in reverse order for little endianness */
>> +		u32 val32 = xfrm_pixel(pix[0]) |
>> +			   (xfrm_pixel(pix[1]) << 16);
>> +		*dbuf32++ = cpu_to_le32(val32);
>> +	}
>> +	send32 += pixels & GENMASK(0, 0);
>> +
>> +	/* write trailing pixel */
>> +	dbuf16 = (__le16 __force *)dbuf32;
>>   	while (sbuf32 < send32)
>>   		*dbuf16++ = cpu_to_le16(xfrm_pixel(le32_to_cpup(sbuf32++)));
>>   }

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-03-26 12:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-25 10:31 [PATCH 0/8] drm/format-helper: Add helpers for line conversion Thomas Zimmermann
2025-03-25 10:31 ` [PATCH 1/8] drm/format-helper: Move helpers for pixel conversion to header file Thomas Zimmermann
2025-03-26  8:51   ` Jocelyn Falempe
2025-03-25 10:31 ` [PATCH 2/8] drm/format-helper: Add generic conversion to 32-bit formats Thomas Zimmermann
2025-03-26  8:52   ` Jocelyn Falempe
2025-03-25 10:31 ` [PATCH 3/8] drm/format-helper: Add generic conversion to 24-bit formats Thomas Zimmermann
2025-03-26  8:52   ` Jocelyn Falempe
2025-03-25 10:31 ` [PATCH 4/8] drm/format-helper: Add generic conversion to 16-bit formats Thomas Zimmermann
2025-03-26  8:53   ` Jocelyn Falempe
2025-03-25 10:31 ` [PATCH 5/8] drm/format-helper: Add generic conversion to 8-bit formats Thomas Zimmermann
2025-03-26  8:53   ` Jocelyn Falempe
2025-03-25 10:31 ` [PATCH 6/8] drm/format-helper: Optimize 32-to-24-bpp conversion Thomas Zimmermann
2025-03-26  8:55   ` Jocelyn Falempe
2025-03-25 10:31 ` [PATCH 7/8] drm/format-helper: Optimize 32-to-16-bpp conversion Thomas Zimmermann
2025-03-26  8:56   ` Jocelyn Falempe
2025-03-26 10:53   ` Jani Nikula
2025-03-26 12:36     ` Thomas Zimmermann
2025-03-25 10:31 ` [PATCH 8/8] drm/format-helper: Optimize 32-to-8-bpp conversion Thomas Zimmermann
2025-03-26  8:57   ` Jocelyn Falempe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.