[PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading
@ 2024-10-31 17:53 Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 1/9] drm/vkms: Code formatting Louis Chauvet
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee, Pekka Paalanen, Maíra Canal

This patchset is the second version of [1]. It is almost a complete
rewrite to use a line-by-line algorithm for the composition.

It can be divided in multiple parts:
- PATCH 1 to 3: no functional change is intended, only some formatting and
  documenting (PATCH 2 is taken from [2])
- PATCH 4 to 7: Some preparation work not directly related to the
  line-by-line algorithm
- PATCH 8: main patch for this series, it reintroduce the
  line-by-line algorithm
- PATCH 9: Remove useless drm_simplify_rotation
- Rest of the series: moved to a new series to merge this one, see the new 
  series "Add YUV ad R1..8 formats support to VKMS"

The PATCH 8 aims to restore the line-by-line pixel reading algorithm. It
was introduced in 8ba1648567e2 ("drm: vkms: Refactor the plane composer to
accept new formats") but removed in 8ba1648567e2 ("drm: vkms: Refactor the
plane composer to accept new formats") in a over-simplification effort.
At this time, nobody noticed the performance impact of this commit. After
the first iteration of my series, poeple notice performance impact, and it
was the case. Pekka suggested to reimplement the line-by-line algorithm.

Expiriments on my side shown great improvement for the line-by-line
algorithm, and the performances are the same as the original line-by-line
algorithm. I targeted my effort to make the code working for all the
rotations and translations. The usage of helpers from drm_rect_* avoid
reimplementing existing logic.

The only "complex" part remaining is the clipping of the coordinate to
avoid reading/writing outside of src/dst. Thus I added a lot of comments
to help when someone will want to add some features (framebuffer resizing
for example).

I did not changed any expected test results as VKMS seems to have some 
existing issue:
https://gitlab.freedesktop.org/jim.cromie/kernel-drm-next-dd/-/jobs/61484201
https://gitlab.freedesktop.org/jim.cromie/kernel-drm-next-dd/-/jobs/61803193
https://gitlab.freedesktop.org/louischauvet/kernel/-/jobs/65944002

To properly test the rotation algorithm, I had to implement a new IGT
test [8]. This helped to found one issue in the YUV rotation algortihm.

My series was mainly tested with:
- kms_plane (for color conversions)
- kms_rotation_crc (for a subset of rotation and formats)
- kms_rotation (to test all rotation and formats combinations) [8]
- kms_cursor_crc (for translations)
The benchmark used to measure the improvment was done with
kms_fb_stress [10] with some modifications:
- Fixing the writeback format to XRGB8888
- Using a primary plane with odd dimension to avoid failures due to YUV
  alignment
The KMS structure was:
	CRTC:
		rectangle: 4096x2160+0+0
	primary:
		format: ABGR16161616
		rectangle: 3640x2160+101+0
	writeback:
		format: XRGB8888
		rectangle: 4096x2160+0+0
Results (on my computer):

8356b9790650: drm/test: Add test cases for drm_rect_rotate_inv() (before any regression)
322d716a3e8a: drm/vkms: isolate pixel conversion functionality (first regression)
cc4fd2934d41: drm/vkms: Isolate writeback pixel conversion functions (second regression)
2c3d1bd284c5: drm/panel: simple: Add Microtips Technology MF-103HIEB0GA0 panel (current drm-misc-next)

 Used format  | This series | 2c3d1bd284c5 | cc4fd2934d41 | 322d716a3e8a | 8356b9790650 |
--------------+-------------+--------------+--------------+--------------+--------------+
 XRGB8888     |  13.261666s |   14.289582s |   10.731272s |    9.480001s |    9.277507s |
 XRGB16161616 |  13.282479s |   13.918926s |   10.712616s |    9.776903s |    9.291766s |
 RGB565       | 136.154163s |  141.646489s |  101.744050s |  103.712164s |   87.860923s |

This is a 5-10% improvment of the performance. More work need to be done
on the writeback to gain more.

[1]: https://lore.kernel.org/all/20240201-yuv-v1-0-3ca376f27632@bootlin.com
[2]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-0-952fcaa5a193@riseup.net/
[3]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-3-952fcaa5a193@riseup.net/
[4]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-5-952fcaa5a193@riseup.net/
[5]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-6-952fcaa5a193@riseup.net/
[6]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-7-952fcaa5a193@riseup.net/
[8]: https://lore.kernel.org/r/20240313-new_rotation-v2-0-6230fd5cae59@bootlin.com
[9]: https://lore.kernel.org/dri-devel/20240306-louis-vkms-conv-v1-1-5bfe7d129fdd@riseup.net/
[10]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc88@riseup.net/

To: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
To: Melissa Wen <melissa.srw@gmail.com>
To: Maíra Canal <mairacanal@riseup.net>
To: Haneen Mohammed <hamohammed.sa@gmail.com>
To: Daniel Vetter <daniel@ffwll.ch>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: Maxime Ripard <mripard@kernel.org>
To: Thomas Zimmermann <tzimmermann@suse.de>
To: David Airlie <airlied@gmail.com>
To: rdunlap@infradead.org
To: arthurgrillo@riseup.net
To: Jonathan Corbet <corbet@lwn.net>
To: pekka.paalanen@haloniitty.fi
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: jeremie.dautheribes@bootlin.com
Cc: miquel.raynal@bootlin.com
Cc: thomas.petazzoni@bootlin.com
Cc: seanpaul@google.com
Cc: marcheu@google.com
Cc: nicolejadeyee@google.com
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>

Changes in v13:
- Removed the YUV part to prepare the merge
- Add Acked-by from Maíra
- Link to v12: https://lore.kernel.org/r/20241007-yuv-v12-0-01c1ada6fec8@bootlin.com
Changes in v12:
- Fix documentation issues as suggested by Randy
- Link to v11: https://lore.kernel.org/r/20240930-yuv-v11-0-4b1a26bcfc96@bootlin.com
Changes in v11:
- Remove documentation patch (already merged)
- Fix sparse warning about documentation
- Link to v10: https://lore.kernel.org/r/20240809-yuv-v10-0-1a7c764166f7@bootlin.com
Changes in v10:
- Properly remove the patch introducing dummy read/write functions
- PATCH 8/16: Format fixups
- PATCH 9/16: Format fixups
- PATCH 11/16: Format fixups
- PATCH 14/16: Fix test compilation, add module description
- Link to v9: https://lore.kernel.org/r/20240802-yuv-v9-0-08a706669e16@bootlin.com
Changes in v9:
- PATCH 3/17: Fix docs as Maíra suggested
- PATCH 4,6,10,12,15,17/17: Fix sparse warning about __le16 casting
- Link to v8: https://lore.kernel.org/all/20240516-yuv-v8-0-cf8d6f86430e@bootlin.com/
Changes in v8:
- PATCH 7/17: Update pitch access to use the proper value for block
  formats
- PATCH 9/17: Update pitch access to use the proper value for block
  formats
- Link to v7: https://lore.kernel.org/r/20240513-yuv-v7-0-380e9ffec502@bootlin.com
Changes in v7:
- Some typos and indent fixes
- Add Review-By, Acked-By
- PATCH 3/17: Clarify src/dst unit
- PATCH 9/17: Clarify documentation
- PATCH 9/17: Restrict conditions for direction
- PATCH 9/17: Rename get_block_step_byte to get_block_step_bytes
- PATCH 10/17: Clarify kernel doc for clamp_line_coordinates, blend_line,
  pixel_read_line_t
- PATCH 10/17: Fix the case when src_*_start >= fb->width/height
- PATCH 10/17: Change y in blend to be an int
- PATCH 10/17: Clarify documentation for read functions
- PATCH 12/17: Fix the type of rgb variables in argb_u16_from_yuv888
- PATCH 12/17: Move comments at the right place, remove useless ones
- PATCH 12/17: Add missing const
- PATCH 17/17: Use drm_format_info_bpp and computation to avoid hard-coded
  values
- Link to v6: https://lore.kernel.org/r/20240409-yuv-v6-0-de1c5728fd70@bootlin.com
Changes in v6:
- Add Randy
- Add Review-By and Acked-By
- PATCH 2/17: Remove useless newline
- PATCH 3/17: Fix kernel doc
- PATCH 4/17: Fix typo in git commit
- PATCH 4/17: Fix kernel doc and simplify brief description of typedef
- PATCH 5/17: Change black default color to Magenta
- PATCH 5/17: Fix wording in comment
- PATCH 7/17: Fix typo in packed_pixel_offset
- PATCH 7/17: Add WARN_ON for currently not supported formats
- PATCH 8/17: Rename x_limit to pixel_count
- PATCH 8/17: Clarify kernel doc for pre_mul_alpha_blend
- PATCH 9/17: Rename get_step_next_block to get_block_step_bytes
- PATCH 9/17: Change kernel doc order
- PATCH 9/17: Rework the direction_for_rotation function to use drm
  helpers
- PATCH 9/17: Add a warn in direction_for_rotation if the result is not
  expected
- PATCH 10/17: Reword the comment of pixel color conversion functions
- PATCH 10/17: Refactor the blending function to extract functions
- PATCH 11/17: Remove useless drm_rotation_simplify
- PATCH 12/17: Fix typo in comments
- PATCH 12/17: Remove useless define
- PATCH 12/17: Fix some comments typo and kernel doc
- PATCH 12/17: Add a comma at the end of the vkms_formats list
- PATCH 12/17: Use copy of matrix instead of pointers
- PATCH 12/17: Use 16 bit range for yuv conversion
- PATCH 17/17: Add a comma at the end of the vkms_formats list
- PATCH 17/17: Add assertions
- PATCH 17/17: Fix color conversion... Next time I will read the doc
  twice...
- Link to v5: https://lore.kernel.org/r/20240313-yuv-v5-0-e610cbd03f52@bootlin.com
Changes in v5:
- All patches: fix some formatting issues
- PATCH 4/16: Use the correct formatter for 4cc code
- PATCH 7/16: Update the pixel accessors to also return the pixel position
  inside a block.
- PATCH 8/16: Fix a temporary bug
- PATCH 9/16: Update the get_step_1x1 to get_step_next_block and update
  the documentation
- PATCH 10/16: Update to uses the new pixel accessors
- PATCH 10/16: Reword some comments
- PATCH 11/16: Update to use the new pixel accessors
- PATCH 11/16: Fix a bug in the subsampling offset for inverted reading
  (right to left/bottom to top). Found by [8].
- PATCH 11/16: Apply Arthur's modifications (comments, algorithm
  clarification)
- PATCH 11/16: Use the correct formatter for 4cc code
- PATCH 11/16: Update to use the new get_step_next_block
- PATCH 14/16: Apply Arthur's modification (comments, compilation issue)
- PATCH 15/16: Add Arthur's patch to explain the kunit tests
- PATCH 16/16: Introduce DRM_FORMAT_R* support.
- Link to v4: https://lore.kernel.org/r/20240304-yuv-v4-0-76beac8e9793@bootlin.com
Changes in v4:
- PATCH 3/14: Update comments for get_pixel_* functions
- PATCH 4/14: Add WARN when trying to get unsupported pixel_* functions
- PATCH 5/14: Create dummy pixel reader/writer to avoid NULL
  function pointers and kernel OOPS
- PATCH 6/14: Added the usage of const pointers when needed
- PATCH 7/14: Extraction of pixel accessors modification
- PATCH 8/14: Extraction of the blending function modification
- PATCH 9/14: Extraction of the pixel_read_direction enum
- PATCH 10/14: Update direction_for_rotation documentation
- PATCH 10/14: Rename conversion functions to be explicit
- PATCH 10/14: Replace while(count) by while(out_pixel<end) in read_line
  callbacks. It avoid a new variable+addition in the composition hot path.
- PATCH 11/14: Rename conversion functions to be explicit
- PATCH 11/14: Update the documentation for get_subsampling_offset
- PATCH 11/14: Add the matrix_conversion structure to remove a test from
  the hot path.
- PATCH 11/14: Upadate matrix values to use 32.32 fixed floats for
  conversion
- PATCH 12/14: Update commit message
- PATCH 14/14: Change kunit expected value
- Link to v3: https://lore.kernel.org/r/20240226-yuv-v3-0-ff662f0994db@bootlin.com
Changes in v3:
- Correction of remaining git-rebase artefacts
- Added Pekka in copy of this patch
- Link to v2: https://lore.kernel.org/r/20240223-yuv-v2-0-aa6be2827bb7@bootlin.com
Changes in v2:
- Rebased the series on top of drm-misc/drm-misc-net
- Extract the typedef for pixel_read/pixel_write
- Introduce the line-by-line algorithm per pixel format
- Add some documentation for existing and new code
- Port the series [1] to use line-by-line algorithm
- Link to v1: https://lore.kernel.org/r/20240201-yuv-v1-0-3ca376f27632@bootlin.com

---
Arthur Grillo (1):
      drm/vkms: Use drm_frame directly

Louis Chauvet (8):
      drm/vkms: Code formatting
      drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
      drm/vkms: Use const for input pointers in pixel_read an pixel_write functions
      drm/vkms: Update pixels accessor to support packed and multi-plane formats.
      drm/vkms: Avoid computing blending limits inside pre_mul_alpha_blend
      drm/vkms: Introduce pixel_read_direction enum
      drm/vkms: Re-introduce line-per-line composition algorithm
      drm/vkms: Remove useless drm_rotation_simplify

 drivers/gpu/drm/vkms/vkms_composer.c  | 312 ++++++++++++++++++++------
 drivers/gpu/drm/vkms/vkms_crtc.c      |   6 +-
 drivers/gpu/drm/vkms/vkms_drv.c       |   3 +-
 drivers/gpu/drm/vkms/vkms_drv.h       |  55 ++++-
 drivers/gpu/drm/vkms/vkms_formats.c   | 409 ++++++++++++++++++++++++----------
 drivers/gpu/drm/vkms/vkms_formats.h   |   4 +-
 drivers/gpu/drm/vkms/vkms_plane.c     |  17 +-
 drivers/gpu/drm/vkms/vkms_writeback.c |   5 -
 8 files changed, 588 insertions(+), 223 deletions(-)
---
base-commit: 623b1e4d2eace0958996995f9f88cb659a6f69dd
change-id: 20240201-yuv-1337d90d9576

Best regards,
-- 
Louis Chauvet <louis.chauvet@bootlin.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v13 1/9] drm/vkms: Code formatting
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 2/9] drm/vkms: Use drm_frame directly Louis Chauvet
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee, Pekka Paalanen, Maíra Canal

Few no-op changes to remove double spaces and fix wrong alignments.

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 10 +++++-----
 drivers/gpu/drm/vkms/vkms_crtc.c     |  6 ++----
 drivers/gpu/drm/vkms/vkms_drv.c      |  3 +--
 drivers/gpu/drm/vkms/vkms_plane.c    |  8 ++++----
 4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 57a5769fc994..931e214b225c 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -96,7 +96,7 @@ static u16 lerp_u16(u16 a, u16 b, s64 t)
 	s64 a_fp = drm_int2fixp(a);
 	s64 b_fp = drm_int2fixp(b);
 
-	s64 delta = drm_fixp_mul(b_fp - a_fp,  t);
+	s64 delta = drm_fixp_mul(b_fp - a_fp, t);
 
 	return drm_fixp2int(a_fp + delta);
 }
@@ -309,8 +309,8 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
 void vkms_composer_worker(struct work_struct *work)
 {
 	struct vkms_crtc_state *crtc_state = container_of(work,
-						struct vkms_crtc_state,
-						composer_work);
+							  struct vkms_crtc_state,
+							  composer_work);
 	struct drm_crtc *crtc = crtc_state->base.crtc;
 	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
 	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
@@ -335,7 +335,7 @@ void vkms_composer_worker(struct work_struct *work)
 		crtc_state->gamma_lut.base = (struct drm_color_lut *)crtc->state->gamma_lut->data;
 		crtc_state->gamma_lut.lut_length =
 			crtc->state->gamma_lut->length / sizeof(struct drm_color_lut);
-		max_lut_index_fp = drm_int2fixp(crtc_state->gamma_lut.lut_length  - 1);
+		max_lut_index_fp = drm_int2fixp(crtc_state->gamma_lut.lut_length - 1);
 		crtc_state->gamma_lut.channel_value2index_ratio = drm_fixp_div(max_lut_index_fp,
 									       u16_max_fp);
 
@@ -374,7 +374,7 @@ void vkms_composer_worker(struct work_struct *work)
 		drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
 }
 
-static const char * const pipe_crc_sources[] = {"auto"};
+static const char *const pipe_crc_sources[] = { "auto" };
 
 const char *const *vkms_get_crc_sources(struct drm_crtc *crtc,
 					size_t *count)
diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index a40295c18b48..39bf9d4247fa 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -188,8 +188,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc,
 		return ret;
 
 	drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) {
-		plane_state = drm_atomic_get_existing_plane_state(crtc_state->state,
-								  plane);
+		plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, plane);
 		WARN_ON(!plane_state);
 
 		if (!plane_state->visible)
@@ -205,8 +204,7 @@ static int vkms_crtc_atomic_check(struct drm_crtc *crtc,
 
 	i = 0;
 	drm_for_each_plane_mask(plane, crtc->dev, crtc_state->plane_mask) {
-		plane_state = drm_atomic_get_existing_plane_state(crtc_state->state,
-								  plane);
+		plane_state = drm_atomic_get_existing_plane_state(crtc_state->state, plane);
 
 		if (!plane_state->visible)
 			continue;
diff --git a/drivers/gpu/drm/vkms/vkms_drv.c b/drivers/gpu/drm/vkms/vkms_drv.c
index 2d1e95cb66e5..19b7322ce49d 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.c
+++ b/drivers/gpu/drm/vkms/vkms_drv.c
@@ -82,8 +82,7 @@ static void vkms_atomic_commit_tail(struct drm_atomic_state *old_state)
 	drm_atomic_helper_wait_for_flip_done(dev, old_state);
 
 	for_each_old_crtc_in_state(old_state, crtc, old_crtc_state, i) {
-		struct vkms_crtc_state *vkms_state =
-			to_vkms_crtc_state(old_crtc_state);
+		struct vkms_crtc_state *vkms_state = to_vkms_crtc_state(old_crtc_state);
 
 		flush_work(&vkms_state->composer_work);
 	}
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index e5c625ab8e3e..5a8d295e65f2 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -117,10 +117,10 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
 	drm_framebuffer_get(frame_info->fb);
 	frame_info->rotation = drm_rotation_simplify(new_state->rotation, DRM_MODE_ROTATE_0 |
-						     DRM_MODE_ROTATE_90 |
-						     DRM_MODE_ROTATE_270 |
-						     DRM_MODE_REFLECT_X |
-						     DRM_MODE_REFLECT_Y);
+									  DRM_MODE_ROTATE_90 |
+									  DRM_MODE_ROTATE_270 |
+									  DRM_MODE_REFLECT_X |
+									  DRM_MODE_REFLECT_Y);
 
 	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
 			drm_rect_height(&frame_info->rotated), frame_info->rotation);

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 2/9] drm/vkms: Use drm_frame directly
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 1/9] drm/vkms: Code formatting Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 3/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions Louis Chauvet
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee, Pekka Paalanen, Maíra Canal

From: Arthur Grillo <arthurgrillo@riseup.net>

Remove intermidiary variables and access the variables directly from
drm_frame. These changes should be noop.

Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
[Louis Chauvet: Applied review from Maíra]
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h       |  3 ---
 drivers/gpu/drm/vkms/vkms_formats.c   | 11 ++++++-----
 drivers/gpu/drm/vkms/vkms_plane.c     |  3 ---
 drivers/gpu/drm/vkms/vkms_writeback.c |  5 -----
 4 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 672fe191e239..fcb5a5ff7df7 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -42,9 +42,6 @@ struct vkms_frame_info {
 	struct drm_rect rotated;
 	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
 	unsigned int rotation;
-	unsigned int offset;
-	unsigned int pitch;
-	unsigned int cpp;
 };
 
 struct pixel_argb_u16 {
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index e8a5cc235ebb..2a0fbe27d8b2 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -23,8 +23,9 @@
  */
 static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
 {
-	return frame_info->offset + (y * frame_info->pitch)
-				  + (x * frame_info->cpp);
+	struct drm_framebuffer *fb = frame_info->fb;
+
+	return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
 }
 
 /**
@@ -154,12 +155,12 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
 	u8 *src_pixels = get_packed_src_addr(frame_info, y);
 	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
 
-	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->cpp) {
+	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
 		int x_pos = get_x_position(frame_info, limit, x);
 
 		if (drm_rotation_90_or_270(frame_info->rotation))
 			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
-				+ frame_info->cpp * y;
+				+ frame_info->fb->format->cpp[0] * y;
 
 		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
 	}
@@ -253,7 +254,7 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
 	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
 	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
 
-	for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->cpp)
+	for (size_t x = 0; x < x_limit; x++, dst_pixels += frame_info->fb->format->cpp[0])
 		wb->pixel_write(dst_pixels, &in_pixels[x]);
 }
 
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 5a8d295e65f2..21b5adfb44aa 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -125,9 +125,6 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
 			drm_rect_height(&frame_info->rotated), frame_info->rotation);
 
-	frame_info->offset = fb->offsets[0];
-	frame_info->pitch = fb->pitches[0];
-	frame_info->cpp = fb->format->cpp[0];
 	vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
 }
 
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index bc724cbd5e3a..c8582df1f739 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -149,11 +149,6 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
 	crtc_state->active_writeback = active_wb;
 	crtc_state->wb_pending = true;
 	spin_unlock_irq(&output->composer_lock);
-
-	wb_frame_info->offset = fb->offsets[0];
-	wb_frame_info->pitch = fb->pitches[0];
-	wb_frame_info->cpp = fb->format->cpp[0];
-
 	drm_writeback_queue_job(wb_conn, connector_state);
 	active_wb->pixel_write = get_pixel_write_function(wb_format);
 	drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 3/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 1/9] drm/vkms: Code formatting Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 2/9] drm/vkms: Use drm_frame directly Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 4/9] drm/vkms: Use const for input pointers in pixel_read an " Louis Chauvet
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee, Pekka Paalanen

Introduce two typedefs: pixel_read_t and pixel_write_t. It allows the
compiler to check if the passed functions take the correct arguments.
Such typedefs will help ensuring consistency across the code base in
case of update of these prototypes.

Rename input/output variable in a consistent way between read_line and
write_line.

A warn has been added in get_pixel_*_function to alert when an unsupported
pixel format is requested. As those formats are checked before
atomic_update callbacks, it should never happen.

Document for those typedefs.

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h     |  23 ++++++-
 drivers/gpu/drm/vkms/vkms_formats.c | 124 ++++++++++++++++++++----------------
 drivers/gpu/drm/vkms/vkms_formats.h |   4 +-
 drivers/gpu/drm/vkms/vkms_plane.c   |   2 +-
 4 files changed, 94 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index fcb5a5ff7df7..e0d46defed83 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -53,12 +53,31 @@ struct line_buffer {
 	struct pixel_argb_u16 *pixels;
 };
 
+/**
+ * typedef pixel_write_t - These functions are used to read a pixel from a
+ * &struct pixel_argb_u16, convert it in a specific format and write it in the @out_pixel
+ * buffer.
+ *
+ * @out_pixel: destination address to write the pixel
+ * @in_pixel: pixel to write
+ */
+typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel);
+
 struct vkms_writeback_job {
 	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
 	struct vkms_frame_info wb_frame_info;
-	void (*pixel_write)(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel);
+	pixel_write_t pixel_write;
 };
 
+/**
+ * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
+ * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
+ *
+ * @in_pixel: pointer to the pixel to read
+ * @out_pixel: pointer to write the converted pixel
+ */
+typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
+
 /**
  * struct vkms_plane_state - Driver specific plane state
  * @base: base plane state
@@ -69,7 +88,7 @@ struct vkms_writeback_job {
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
 	struct vkms_frame_info *frame_info;
-	void (*pixel_read)(u8 *src_buffer, struct pixel_argb_u16 *out_pixel);
+	pixel_read_t pixel_read;
 };
 
 struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 2a0fbe27d8b2..b9544e67cd4f 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -75,7 +75,7 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i
  * They are used in the vkms_compose_row() function to handle multiple formats.
  */
 
-static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void ARGB8888_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
 	/*
 	 * The 257 is the "conversion ratio". This number is obtained by the
@@ -83,48 +83,48 @@ static void ARGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixe
 	 * the best color value in a pixel format with more possibilities.
 	 * A similar idea applies to others RGB color conversions.
 	 */
-	out_pixel->a = (u16)src_pixels[3] * 257;
-	out_pixel->r = (u16)src_pixels[2] * 257;
-	out_pixel->g = (u16)src_pixels[1] * 257;
-	out_pixel->b = (u16)src_pixels[0] * 257;
+	out_pixel->a = (u16)in_pixel[3] * 257;
+	out_pixel->r = (u16)in_pixel[2] * 257;
+	out_pixel->g = (u16)in_pixel[1] * 257;
+	out_pixel->b = (u16)in_pixel[0] * 257;
 }
 
-static void XRGB8888_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void XRGB8888_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
 	out_pixel->a = (u16)0xffff;
-	out_pixel->r = (u16)src_pixels[2] * 257;
-	out_pixel->g = (u16)src_pixels[1] * 257;
-	out_pixel->b = (u16)src_pixels[0] * 257;
+	out_pixel->r = (u16)in_pixel[2] * 257;
+	out_pixel->g = (u16)in_pixel[1] * 257;
+	out_pixel->b = (u16)in_pixel[0] * 257;
 }
 
-static void ARGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
-	__le16 *pixels = (__force __le16 *)src_pixels;
+	__le16 *pixel = (__le16 *)in_pixel;
 
-	out_pixel->a = le16_to_cpu(pixels[3]);
-	out_pixel->r = le16_to_cpu(pixels[2]);
-	out_pixel->g = le16_to_cpu(pixels[1]);
-	out_pixel->b = le16_to_cpu(pixels[0]);
+	out_pixel->a = le16_to_cpu(pixel[3]);
+	out_pixel->r = le16_to_cpu(pixel[2]);
+	out_pixel->g = le16_to_cpu(pixel[1]);
+	out_pixel->b = le16_to_cpu(pixel[0]);
 }
 
-static void XRGB16161616_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void XRGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
-	__le16 *pixels = (__force __le16 *)src_pixels;
+	__le16 *pixel = (__le16 *)in_pixel;
 
 	out_pixel->a = (u16)0xffff;
-	out_pixel->r = le16_to_cpu(pixels[2]);
-	out_pixel->g = le16_to_cpu(pixels[1]);
-	out_pixel->b = le16_to_cpu(pixels[0]);
+	out_pixel->r = le16_to_cpu(pixel[2]);
+	out_pixel->g = le16_to_cpu(pixel[1]);
+	out_pixel->b = le16_to_cpu(pixel[0]);
 }
 
-static void RGB565_to_argb_u16(u8 *src_pixels, struct pixel_argb_u16 *out_pixel)
+static void RGB565_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
-	__le16 *pixels = (__force __le16 *)src_pixels;
+	__le16 *pixel = (__le16 *)in_pixel;
 
 	s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
 	s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
 
-	u16 rgb_565 = le16_to_cpu(*pixels);
+	u16 rgb_565 = le16_to_cpu(*pixel);
 	s64 fp_r = drm_int2fixp((rgb_565 >> 11) & 0x1f);
 	s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
 	s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
@@ -168,12 +168,12 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
 
 /*
  * The following functions take one &struct pixel_argb_u16 and convert it to a specific format.
- * The result is stored in @dst_pixels.
+ * The result is stored in @out_pixel.
  *
  * They are used in vkms_writeback_row() to convert and store a pixel from the src_buffer to
  * the writeback buffer.
  */
-static void argb_u16_to_ARGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_ARGB8888(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
 {
 	/*
 	 * This sequence below is important because the format's byte order is
@@ -185,43 +185,43 @@ static void argb_u16_to_ARGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel
 	 * | Addr + 2 | = Red channel
 	 * | Addr + 3 | = Alpha channel
 	 */
-	dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixel->a, 257);
-	dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257);
-	dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixel->g, 257);
-	dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257);
+	out_pixel[3] = DIV_ROUND_CLOSEST(in_pixel->a, 257);
+	out_pixel[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257);
+	out_pixel[1] = DIV_ROUND_CLOSEST(in_pixel->g, 257);
+	out_pixel[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257);
 }
 
-static void argb_u16_to_XRGB8888(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_XRGB8888(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
 {
-	dst_pixels[3] = 0xff;
-	dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257);
-	dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixel->g, 257);
-	dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257);
+	out_pixel[3] = 0xff;
+	out_pixel[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257);
+	out_pixel[1] = DIV_ROUND_CLOSEST(in_pixel->g, 257);
+	out_pixel[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257);
 }
 
-static void argb_u16_to_ARGB16161616(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_ARGB16161616(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
 {
-	__le16 *pixels = (__force __le16 *)dst_pixels;
+	__le16 *pixel = (__le16 *)out_pixel;
 
-	pixels[3] = cpu_to_le16(in_pixel->a);
-	pixels[2] = cpu_to_le16(in_pixel->r);
-	pixels[1] = cpu_to_le16(in_pixel->g);
-	pixels[0] = cpu_to_le16(in_pixel->b);
+	pixel[3] = cpu_to_le16(in_pixel->a);
+	pixel[2] = cpu_to_le16(in_pixel->r);
+	pixel[1] = cpu_to_le16(in_pixel->g);
+	pixel[0] = cpu_to_le16(in_pixel->b);
 }
 
-static void argb_u16_to_XRGB16161616(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_XRGB16161616(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
 {
-	__le16 *pixels = (__force __le16 *)dst_pixels;
+	__le16 *pixel = (__le16 *)out_pixel;
 
-	pixels[3] = cpu_to_le16(0xffff);
-	pixels[2] = cpu_to_le16(in_pixel->r);
-	pixels[1] = cpu_to_le16(in_pixel->g);
-	pixels[0] = cpu_to_le16(in_pixel->b);
+	pixel[3] = cpu_to_le16(0xffff);
+	pixel[2] = cpu_to_le16(in_pixel->r);
+	pixel[1] = cpu_to_le16(in_pixel->g);
+	pixel[0] = cpu_to_le16(in_pixel->b);
 }
 
-static void argb_u16_to_RGB565(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_RGB565(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
 {
-	__le16 *pixels = (__force __le16 *)dst_pixels;
+	__le16 *pixel = (__le16 *)out_pixel;
 
 	s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
 	s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
@@ -234,7 +234,7 @@ static void argb_u16_to_RGB565(u8 *dst_pixels, struct pixel_argb_u16 *in_pixel)
 	u16 g = drm_fixp2int(drm_fixp_div(fp_g, fp_g_ratio));
 	u16 b = drm_fixp2int(drm_fixp_div(fp_b, fp_rb_ratio));
 
-	*pixels = cpu_to_le16(r << 11 | g << 5 | b);
+	*pixel = cpu_to_le16(r << 11 | g << 5 | b);
 }
 
 /**
@@ -259,13 +259,13 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
 }
 
 /**
- * get_pixel_conversion_function() - Retrieve the correct read_pixel function for a specific
+ * get_pixel_read_function() - Retrieve the correct read_pixel function for a specific
  * format. The returned pointer is NULL for unsupported pixel formats. The caller must ensure that
  * the pointer is valid before using it in a vkms_plane_state.
  *
  * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h])
  */
-void *get_pixel_conversion_function(u32 format)
+pixel_read_t get_pixel_read_function(u32 format)
 {
 	switch (format) {
 	case DRM_FORMAT_ARGB8888:
@@ -279,7 +279,15 @@ void *get_pixel_conversion_function(u32 format)
 	case DRM_FORMAT_RGB565:
 		return &RGB565_to_argb_u16;
 	default:
-		return NULL;
+		/*
+		 * This is a bug in vkms_plane_atomic_check(). All the supported
+		 * format must:
+		 * - Be listed in vkms_formats in vkms_plane.c
+		 * - Have a pixel_read callback defined here
+		 */
+		pr_err("Pixel format %p4cc is not supported by VKMS planes. This is a kernel bug, atomic check must forbid this configuration.\n",
+		       &format);
+		BUG();
 	}
 }
 
@@ -290,7 +298,7 @@ void *get_pixel_conversion_function(u32 format)
  *
  * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h])
  */
-void *get_pixel_write_function(u32 format)
+pixel_write_t get_pixel_write_function(u32 format)
 {
 	switch (format) {
 	case DRM_FORMAT_ARGB8888:
@@ -304,6 +312,14 @@ void *get_pixel_write_function(u32 format)
 	case DRM_FORMAT_RGB565:
 		return &argb_u16_to_RGB565;
 	default:
-		return NULL;
+		/*
+		 * This is a bug in vkms_writeback_atomic_check. All the supported
+		 * format must:
+		 * - Be listed in vkms_wb_formats in vkms_writeback.c
+		 * - Have a pixel_write callback defined here
+		 */
+		pr_err("Pixel format %p4cc is not supported by VKMS writeback. This is a kernel bug, atomic check must forbid this configuration.\n",
+		       &format);
+		BUG();
 	}
 }
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index cf59c2ed8e9a..3ecea4563254 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -5,8 +5,8 @@
 
 #include "vkms_drv.h"
 
-void *get_pixel_conversion_function(u32 format);
+pixel_read_t get_pixel_read_function(u32 format);
 
-void *get_pixel_write_function(u32 format);
+pixel_write_t get_pixel_write_function(u32 format);
 
 #endif /* _VKMS_FORMATS_H_ */
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 21b5adfb44aa..10e9b23dab28 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -125,7 +125,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
 			drm_rect_height(&frame_info->rotated), frame_info->rotation);
 
-	vkms_plane_state->pixel_read = get_pixel_conversion_function(fmt);
+	vkms_plane_state->pixel_read = get_pixel_read_function(fmt);
 }
 
 static int vkms_plane_atomic_check(struct drm_plane *plane,

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 4/9] drm/vkms: Use const for input pointers in pixel_read an pixel_write functions
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (2 preceding siblings ...)
  2024-10-31 17:53 ` [PATCH v13 3/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats Louis Chauvet
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee, Pekka Paalanen, Maíra Canal

As the pixel_read and pixel_write function should never modify the input
buffer, mark those pointers const.

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h     |  4 ++--
 drivers/gpu/drm/vkms/vkms_formats.c | 20 ++++++++++----------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index e0d46defed83..3f45290a0c5d 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -61,7 +61,7 @@ struct line_buffer {
  * @out_pixel: destination address to write the pixel
  * @in_pixel: pixel to write
  */
-typedef void (*pixel_write_t)(u8 *out_pixel, struct pixel_argb_u16 *in_pixel);
+typedef void (*pixel_write_t)(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel);
 
 struct vkms_writeback_job {
 	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
@@ -76,7 +76,7 @@ struct vkms_writeback_job {
  * @in_pixel: pointer to the pixel to read
  * @out_pixel: pointer to write the converted pixel
  */
-typedef void (*pixel_read_t)(u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
+typedef void (*pixel_read_t)(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
 
 /**
  * struct vkms_plane_state - Driver specific plane state
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index b9544e67cd4f..06aef5162529 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -75,7 +75,7 @@ static int get_x_position(const struct vkms_frame_info *frame_info, int limit, i
  * They are used in the vkms_compose_row() function to handle multiple formats.
  */
 
-static void ARGB8888_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static void ARGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
 	/*
 	 * The 257 is the "conversion ratio". This number is obtained by the
@@ -89,7 +89,7 @@ static void ARGB8888_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 	out_pixel->b = (u16)in_pixel[0] * 257;
 }
 
-static void XRGB8888_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static void XRGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
 	out_pixel->a = (u16)0xffff;
 	out_pixel->r = (u16)in_pixel[2] * 257;
@@ -97,7 +97,7 @@ static void XRGB8888_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 	out_pixel->b = (u16)in_pixel[0] * 257;
 }
 
-static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static void ARGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
 	__le16 *pixel = (__le16 *)in_pixel;
 
@@ -107,7 +107,7 @@ static void ARGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pi
 	out_pixel->b = le16_to_cpu(pixel[0]);
 }
 
-static void XRGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static void XRGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
 	__le16 *pixel = (__le16 *)in_pixel;
 
@@ -117,7 +117,7 @@ static void XRGB16161616_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pi
 	out_pixel->b = le16_to_cpu(pixel[0]);
 }
 
-static void RGB565_to_argb_u16(u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
 {
 	__le16 *pixel = (__le16 *)in_pixel;
 
@@ -173,7 +173,7 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
  * They are used in vkms_writeback_row() to convert and store a pixel from the src_buffer to
  * the writeback buffer.
  */
-static void argb_u16_to_ARGB8888(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_ARGB8888(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel)
 {
 	/*
 	 * This sequence below is important because the format's byte order is
@@ -191,7 +191,7 @@ static void argb_u16_to_ARGB8888(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
 	out_pixel[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257);
 }
 
-static void argb_u16_to_XRGB8888(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_XRGB8888(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel)
 {
 	out_pixel[3] = 0xff;
 	out_pixel[2] = DIV_ROUND_CLOSEST(in_pixel->r, 257);
@@ -199,7 +199,7 @@ static void argb_u16_to_XRGB8888(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
 	out_pixel[0] = DIV_ROUND_CLOSEST(in_pixel->b, 257);
 }
 
-static void argb_u16_to_ARGB16161616(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_ARGB16161616(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel)
 {
 	__le16 *pixel = (__le16 *)out_pixel;
 
@@ -209,7 +209,7 @@ static void argb_u16_to_ARGB16161616(u8 *out_pixel, struct pixel_argb_u16 *in_pi
 	pixel[0] = cpu_to_le16(in_pixel->b);
 }
 
-static void argb_u16_to_XRGB16161616(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_XRGB16161616(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel)
 {
 	__le16 *pixel = (__le16 *)out_pixel;
 
@@ -219,7 +219,7 @@ static void argb_u16_to_XRGB16161616(u8 *out_pixel, struct pixel_argb_u16 *in_pi
 	pixel[0] = cpu_to_le16(in_pixel->b);
 }
 
-static void argb_u16_to_RGB565(u8 *out_pixel, struct pixel_argb_u16 *in_pixel)
+static void argb_u16_to_RGB565(u8 *out_pixel, const struct pixel_argb_u16 *in_pixel)
 {
 	__le16 *pixel = (__le16 *)out_pixel;
 

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats.
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (3 preceding siblings ...)
  2024-10-31 17:53 ` [PATCH v13 4/9] drm/vkms: Use const for input pointers in pixel_read an " Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-11-18 17:10   ` José Expósito
  2024-10-31 17:53 ` [PATCH v13 6/9] drm/vkms: Avoid computing blending limits inside pre_mul_alpha_blend Louis Chauvet
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee

Introduce the usage of block_h/block_w to compute the offset and the
pointer of a pixel. The previous implementation was specialized for
planes with block_h == block_w == 1. To avoid confusion and allow easier
implementation of tiled formats. It also remove the usage of the
deprecated format field `cpp`.

Introduce the plane_index parameter to get an offset/pointer on a
different plane.

Acked-by: Maíra Canal <mairacanal@riseup.net>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_formats.c | 114 ++++++++++++++++++++++++++++--------
 1 file changed, 91 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 06aef5162529..7f932d42394d 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -10,22 +10,46 @@
 #include "vkms_formats.h"
 
 /**
- * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane
+ * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
  *
  * @frame_info: Buffer metadata
  * @x: The x coordinate of the wanted pixel in the buffer
  * @y: The y coordinate of the wanted pixel in the buffer
+ * @plane_index: The index of the plane to use
+ * @offset: The returned offset inside the buffer of the block
+ * @rem_x: The returned X coordinate of the requested pixel in the block
+ * @rem_y: The returned Y coordinate of the requested pixel in the block
  *
- * The caller must ensure that the framebuffer associated with this request uses a pixel format
- * where block_h == block_w == 1.
- * If this requirement is not fulfilled, the resulting offset can point to an other pixel or
- * outside of the buffer.
+ * As some pixel formats store multiple pixels in a block (DRM_FORMAT_R* for example), some
+ * pixels are not individually addressable. This function return 3 values: the offset of the
+ * whole block, and the coordinate of the requested pixel inside this block.
+ * For example, if the format is DRM_FORMAT_R1 and the requested coordinate is 13,5, the offset
+ * will point to the byte 5*pitches + 13/8 (second byte of the 5th line), and the rem_x/rem_y
+ * coordinates will be (13 % 8, 5 % 1) = (5, 0)
+ *
+ * With this function, the caller just have to extract the correct pixel from the block.
  */
-static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
+static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
+				 int plane_index, int *offset, int *rem_x, int *rem_y)
 {
 	struct drm_framebuffer *fb = frame_info->fb;
+	const struct drm_format_info *format = frame_info->fb->format;
+	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
+	 * in some formats a block can represent multiple pixels.
+	 *
+	 * Dividing x and y by the block size allows to extract the correct offset of the block
+	 * containing the pixel.
+	 */
 
-	return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
+	int block_x = x / drm_format_info_block_width(format, plane_index);
+	int block_y = y / drm_format_info_block_height(format, plane_index);
+	int block_pitch = fb->pitches[plane_index] * drm_format_info_block_height(format,
+										  plane_index);
+	*rem_x = x % drm_format_info_block_width(format, plane_index);
+	*rem_y = y % drm_format_info_block_height(format, plane_index);
+	*offset = fb->offsets[plane_index] +
+		  block_y * block_pitch +
+		  block_x * format->char_per_block[plane_index];
 }
 
 /**
@@ -35,30 +59,71 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
  * @frame_info: Buffer metadata
  * @x: The x (width) coordinate inside the plane
  * @y: The y (height) coordinate inside the plane
+ * @plane_index: The index of the plane
+ * @addr: The returned pointer
+ * @rem_x: The returned X coordinate of the requested pixel in the block
+ * @rem_y: The returned Y coordinate of the requested pixel in the block
  *
- * Takes the information stored in the frame_info, a pair of coordinates, and
- * returns the address of the first color channel.
- * This function assumes the channels are packed together, i.e. a color channel
- * comes immediately after another in the memory. And therefore, this function
- * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
+ * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
+ * of the block containing this pixel and the pixel position inside this block.
  *
- * The caller must ensure that the framebuffer associated with this request uses a pixel format
- * where block_h == block_w == 1, otherwise the returned pointer can be outside the buffer.
+ * See @packed_pixel_offset for details about rem_x/rem_y behavior.
  */
-static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
-				int x, int y)
+static void packed_pixels_addr(const struct vkms_frame_info *frame_info,
+			       int x, int y, int plane_index, u8 **addr, int *rem_x,
+			       int *rem_y)
 {
-	size_t offset = pixel_offset(frame_info, x, y);
+	int offset;
 
-	return (u8 *)frame_info->map[0].vaddr + offset;
+	packed_pixels_offset(frame_info, x, y, plane_index, &offset, rem_x, rem_y);
+	*addr = (u8 *)frame_info->map[0].vaddr + offset;
 }
 
-static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
+/**
+ * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given
+ * coordinates
+ *
+ * @frame_info: Buffer metadata
+ * @x: The x (width) coordinate inside the plane
+ * @y: The y (height) coordinate inside the plane
+ * @plane_index: The index of the plane
+ * @addr: The returned pointer
+ *
+ * This function can only be used with format where block_h == block_w == 1.
+ */
+static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
+				   int x, int y, int plane_index, u8 **addr)
+{
+	int offset, rem_x, rem_y;
+
+	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format,
+					      plane_index) != 1,
+		"%s() only support formats with block_w == 1", __func__);
+	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format,
+					       plane_index) != 1,
+		"%s() only support formats with block_h == 1", __func__);
+
+	packed_pixels_offset(frame_info, x, y, plane_index, &offset, &rem_x,
+			     &rem_y);
+	*addr = (u8 *)frame_info->map[0].vaddr + offset;
+}
+
+static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
+				 int plane_index)
 {
 	int x_src = frame_info->src.x1 >> 16;
 	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
+	u8 *addr;
+	int rem_x, rem_y;
+
+	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
+		  "%s() only support formats with block_w == 1", __func__);
+	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
+		  "%s() only support formats with block_h == 1", __func__);
 
-	return packed_pixels_addr(frame_info, x_src, y_src);
+	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
+
+	return addr;
 }
 
 static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
@@ -152,14 +217,14 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
 {
 	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
 	struct vkms_frame_info *frame_info = plane->frame_info;
-	u8 *src_pixels = get_packed_src_addr(frame_info, y);
+	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
 	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
 
 	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
 		int x_pos = get_x_position(frame_info, limit, x);
 
 		if (drm_rotation_90_or_270(frame_info->rotation))
-			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
+			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
 				+ frame_info->fb->format->cpp[0] * y;
 
 		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
@@ -250,7 +315,10 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
 {
 	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
 	int x_dst = frame_info->dst.x1;
-	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
+	u8 *dst_pixels;
+	int rem_x, rem_y;
+
+	packed_pixels_addr(frame_info, x_dst, y, 0, &dst_pixels, &rem_x, &rem_y);
 	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
 	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
 

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 6/9] drm/vkms: Avoid computing blending limits inside pre_mul_alpha_blend
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (4 preceding siblings ...)
  2024-10-31 17:53 ` [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-10-31 17:53 ` [PATCH v13 7/9] drm/vkms: Introduce pixel_read_direction enum Louis Chauvet
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee, Pekka Paalanen

The pre_mul_alpha_blend is dedicated to blending, so to avoid mixing
different concepts (coordinate calculation and color management), extract
the x_limit and x_dst computation outside of this helper.
It also increases the maintainability by grouping the computation related
to coordinates in the same place: the loop in `blend`.

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 40 +++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 931e214b225c..ecac0bc858a0 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -24,34 +24,30 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
 
 /**
  * pre_mul_alpha_blend - alpha blending equation
- * @frame_info: Source framebuffer's metadata
  * @stage_buffer: The line with the pixels from src_plane
  * @output_buffer: A line buffer that receives all the blends output
+ * @x_start: The start offset
+ * @pixel_count: The number of pixels to blend
  *
- * Using the information from the `frame_info`, this blends only the
- * necessary pixels from the `stage_buffer` to the `output_buffer`
- * using premultiplied blend formula.
+ * The pixels [0;@pixel_count) in stage_buffer are blended at [@x_start;@x_start+@pixel_count) in
+ * output_buffer.
  *
  * The current DRM assumption is that pixel color values have been already
  * pre-multiplied with the alpha channel values. See more
  * drm_plane_create_blend_mode_property(). Also, this formula assumes a
  * completely opaque background.
  */
-static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
-				struct line_buffer *stage_buffer,
-				struct line_buffer *output_buffer)
+static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer,
+				struct line_buffer *output_buffer, int x_start, int pixel_count)
 {
-	int x_dst = frame_info->dst.x1;
-	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
-	struct pixel_argb_u16 *in = stage_buffer->pixels;
-	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
-			    stage_buffer->n_pixels);
-
-	for (int x = 0; x < x_limit; x++) {
-		out[x].a = (u16)0xffff;
-		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
-		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
-		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
+	struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
+	const struct pixel_argb_u16 *in = stage_buffer->pixels;
+
+	for (int i = 0; i < pixel_count; i++) {
+		out[i].a = (u16)0xffff;
+		out[i].r = pre_mul_blend_channel(in[i].r, out[i].r, in[i].a);
+		out[i].g = pre_mul_blend_channel(in[i].g, out[i].g, in[i].a);
+		out[i].b = pre_mul_blend_channel(in[i].b, out[i].b, in[i].a);
 	}
 }
 
@@ -183,7 +179,7 @@ static void blend(struct vkms_writeback_job *wb,
 {
 	struct vkms_plane_state **plane = crtc_state->active_planes;
 	u32 n_active_planes = crtc_state->num_active_planes;
-	int y_pos;
+	int y_pos, x_dst, pixel_count;
 
 	const struct pixel_argb_u16 background_color = { .a = 0xffff };
 
@@ -201,14 +197,16 @@ static void blend(struct vkms_writeback_job *wb,
 
 		/* The active planes are composed associatively in z-order. */
 		for (size_t i = 0; i < n_active_planes; i++) {
+			x_dst = plane[i]->frame_info->dst.x1;
+			pixel_count = min_t(int, drm_rect_width(&plane[i]->frame_info->dst),
+					    (int)stage_buffer->n_pixels);
 			y_pos = get_y_pos(plane[i]->frame_info, y);
 
 			if (!check_limit(plane[i]->frame_info, y_pos))
 				continue;
 
 			vkms_compose_row(stage_buffer, plane[i], y_pos);
-			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
-					    output_buffer);
+			pre_mul_alpha_blend(stage_buffer, output_buffer, x_dst, pixel_count);
 		}
 
 		apply_lut(crtc_state, output_buffer);

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 7/9] drm/vkms: Introduce pixel_read_direction enum
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (5 preceding siblings ...)
  2024-10-31 17:53 ` [PATCH v13 6/9] drm/vkms: Avoid computing blending limits inside pre_mul_alpha_blend Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-11-18 17:10   ` José Expósito
  2024-10-31 17:53 ` [PATCH v13 8/9] drm/vkms: Re-introduce line-per-line composition algorithm Louis Chauvet
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee

The pixel_read_direction enum is useful to describe the reading direction
in a plane. It avoids using the rotation property of DRM, which not
practical to know the direction of reading.
This patch also introduce two helpers, one to compute the
pixel_read_direction from the DRM rotation property, and one to compute
the step, in byte, between two successive pixel in a specific direction.

Acked-by: Maíra Canal <mairacanal@riseup.net>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 44 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_drv.h      | 11 +++++++++
 drivers/gpu/drm/vkms/vkms_formats.c  | 32 ++++++++++++++++++++++++++
 3 files changed, 87 insertions(+)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index ecac0bc858a0..601e33431b45 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -159,6 +159,50 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
 	}
 }
 
+/**
+ * direction_for_rotation() - Get the correct reading direction for a given rotation
+ *
+ * @rotation: Rotation to analyze. It correspond the field @frame_info.rotation.
+ *
+ * This function will use the @rotation setting of a source plane to compute the reading
+ * direction in this plane which correspond to a "left to right writing" in the CRTC.
+ * For example, if the buffer is reflected on X axis, the pixel must be read from right to left
+ * to be written from left to right on the CRTC.
+ */
+static enum pixel_read_direction direction_for_rotation(unsigned int rotation)
+{
+	struct drm_rect tmp_a, tmp_b;
+	int x, y;
+
+	/*
+	 * Points A and B are depicted as zero-size rectangles on the CRTC.
+	 * The CRTC writing direction is from A to B. The plane reading direction
+	 * is discovered by inverse-transforming A and B.
+	 * The reading direction is computed by rotating the vector AB (top-left to top-right) in a
+	 * 1x1 square.
+	 */
+
+	tmp_a = DRM_RECT_INIT(0, 0, 0, 0);
+	tmp_b = DRM_RECT_INIT(1, 0, 0, 0);
+	drm_rect_rotate_inv(&tmp_a, 1, 1, rotation);
+	drm_rect_rotate_inv(&tmp_b, 1, 1, rotation);
+
+	x = tmp_b.x1 - tmp_a.x1;
+	y = tmp_b.y1 - tmp_a.y1;
+
+	if (x == 1 && y == 0)
+		return READ_LEFT_TO_RIGHT;
+	else if (x == -1 && y == 0)
+		return READ_RIGHT_TO_LEFT;
+	else if (y == 1 && x == 0)
+		return READ_TOP_TO_BOTTOM;
+	else if (y == -1 && x == 0)
+		return READ_BOTTOM_TO_TOP;
+
+	WARN_ONCE(true, "The inverse of the rotation gives an incorrect direction.");
+	return READ_LEFT_TO_RIGHT;
+}
+
 /**
  * blend - blend the pixels from all planes and compute crc
  * @wb: The writeback frame buffer metadata
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 3f45290a0c5d..777b7bd91f27 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -69,6 +69,17 @@ struct vkms_writeback_job {
 	pixel_write_t pixel_write;
 };
 
+/**
+ * enum pixel_read_direction - Enum used internaly by VKMS to represent a reading direction in a
+ * plane.
+ */
+enum pixel_read_direction {
+	READ_BOTTOM_TO_TOP,
+	READ_TOP_TO_BOTTOM,
+	READ_RIGHT_TO_LEFT,
+	READ_LEFT_TO_RIGHT
+};
+
 /**
  * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
  * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 7f932d42394d..d0e7dfc1f0d3 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -79,6 +79,38 @@ static void packed_pixels_addr(const struct vkms_frame_info *frame_info,
 	*addr = (u8 *)frame_info->map[0].vaddr + offset;
 }
 
+/**
+ * get_block_step_bytes() - Common helper to compute the correct step value between each pixel block
+ * to read in a certain direction.
+ *
+ * @fb: Framebuffer to iter on
+ * @direction: Direction of the reading
+ * @plane_index: Plane to get the step from
+ *
+ * As the returned count is the number of bytes between two consecutive blocks in a direction,
+ * the caller may have to read multiple pixels before using the next one (for example, to read from
+ * left to right in a DRM_FORMAT_R1 plane, each block contains 8 pixels, so the step must be used
+ * only every 8 pixels).
+ */
+static int get_block_step_bytes(struct drm_framebuffer *fb, enum pixel_read_direction direction,
+				int plane_index)
+{
+	switch (direction) {
+	case READ_LEFT_TO_RIGHT:
+		return fb->format->char_per_block[plane_index];
+	case READ_RIGHT_TO_LEFT:
+		return -fb->format->char_per_block[plane_index];
+	case READ_TOP_TO_BOTTOM:
+		return (int)fb->pitches[plane_index] * drm_format_info_block_width(fb->format,
+										   plane_index);
+	case READ_BOTTOM_TO_TOP:
+		return -(int)fb->pitches[plane_index] * drm_format_info_block_width(fb->format,
+										    plane_index);
+	}
+
+	return 0;
+}
+
 /**
  * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given
  * coordinates

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 8/9] drm/vkms: Re-introduce line-per-line composition algorithm
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (6 preceding siblings ...)
  2024-10-31 17:53 ` [PATCH v13 7/9] drm/vkms: Introduce pixel_read_direction enum Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-11-18 17:10   ` José Expósito
  2024-10-31 17:53 ` [PATCH v13 9/9] drm/vkms: Remove useless drm_rotation_simplify Louis Chauvet
  2024-11-18 17:10 ` [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading José Expósito
  9 siblings, 1 reply; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee, Pekka Paalanen

Re-introduce a line-by-line composition algorithm for each pixel format.
This allows more performance by not requiring an indirection per pixel
read. This patch is focused on readability of the code.

Line-by-line composition was introduced by [1] but rewritten back to
pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
on performance, and it was merged.

This patch is almost a revert of [2], but in addition efforts have been
made to increase readability and maintainability of the rotation handling.
The blend function is now divided in two parts:
- Transformation of coordinates from the output referential to the source
referential
- Line conversion and blending

Most of the complexity of the rotation management is avoided by using
drm_rect_* helpers. The remaining complexity is around the clipping, to
avoid reading/writing outside source/destination buffers.

The pixel conversion is now done line-by-line, so the read_pixel_t was
replaced with read_pixel_line_t callback. This way the indirection is only
required once per line and per plane, instead of once per pixel and per
plane.

The read_line_t callbacks are very similar for most pixel format, but it
is required to avoid performance impact. Some helpers for color
conversion were introduced to avoid code repetition:
- *_to_argb_u16: perform colors conversion. They should be inlined by the
  compiler, and they are used to avoid repetition between multiple variants
  of the same format (argb/xrgb and maybe in the future for formats like
  bgr formats).

This new algorithm was tested with:
- kms_plane (for color conversions)
- kms_rotation_crc (for rotations of planes)
- kms_cursor_crc (for translations of planes)
- kms_rotation (for all rotations and formats combinations) [3]
The performance gain was mesured with kms_fb_stress [4] with some
modification to fix the writeback format.

The performance improvement is around 5 to 10%.

[1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
     new formats")
     https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/
[2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
     functionality")
     https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/
[3]: https://lore.kernel.org/igt-dev/20240313-new_rotation-v2-0-6230fd5cae59@bootlin.com/
[4]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc88@riseup.net/

Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>

# Conflicts:
#	drivers/gpu/drm/vkms/vkms_composer.c

Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 234 ++++++++++++++++++++++++++++-------
 drivers/gpu/drm/vkms/vkms_drv.h      |  28 +++--
 drivers/gpu/drm/vkms/vkms_formats.c  | 224 ++++++++++++++++++++-------------
 drivers/gpu/drm/vkms/vkms_formats.h  |   2 +-
 drivers/gpu/drm/vkms/vkms_plane.c    |   5 +-
 5 files changed, 344 insertions(+), 149 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 601e33431b45..7a3e47b895a7 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -29,8 +29,8 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
  * @x_start: The start offset
  * @pixel_count: The number of pixels to blend
  *
- * The pixels [0;@pixel_count) in stage_buffer are blended at [@x_start;@x_start+@pixel_count) in
- * output_buffer.
+ * The pixels [@x_start;@x_start+@pixel_count) in stage_buffer are blended at
+ * [@x_start;@x_start+@pixel_count) in output_buffer.
  *
  * The current DRM assumption is that pixel color values have been already
  * pre-multiplied with the alpha channel values. See more
@@ -41,7 +41,7 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer,
 				struct line_buffer *output_buffer, int x_start, int pixel_count)
 {
 	struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
-	const struct pixel_argb_u16 *in = stage_buffer->pixels;
+	const struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
 
 	for (int i = 0; i < pixel_count; i++) {
 		out[i].a = (u16)0xffff;
@@ -51,33 +51,6 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer,
 	}
 }
 
-static int get_y_pos(struct vkms_frame_info *frame_info, int y)
-{
-	if (frame_info->rotation & DRM_MODE_REFLECT_Y)
-		return drm_rect_height(&frame_info->rotated) - y - 1;
-
-	switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
-	case DRM_MODE_ROTATE_90:
-		return frame_info->rotated.x2 - y - 1;
-	case DRM_MODE_ROTATE_270:
-		return y + frame_info->rotated.x1;
-	default:
-		return y;
-	}
-}
-
-static bool check_limit(struct vkms_frame_info *frame_info, int pos)
-{
-	if (drm_rotation_90_or_270(frame_info->rotation)) {
-		if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated))
-			return true;
-	} else {
-		if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2)
-			return true;
-	}
-
-	return false;
-}
 
 static void fill_background(const struct pixel_argb_u16 *background_color,
 			    struct line_buffer *output_buffer)
@@ -203,6 +176,182 @@ static enum pixel_read_direction direction_for_rotation(unsigned int rotation)
 	return READ_LEFT_TO_RIGHT;
 }
 
+/**
+ * clamp_line_coordinates() - Compute and clamp the coordinate to read and write during the blend
+ * process.
+ *
+ * @direction: direction of the reading
+ * @current_plane: current plane blended
+ * @src_line: source line of the reading. Only the top-left coordinate is used. This rectangle
+ * must be rotated and have a shape of 1*pixel_count if @direction is vertical and a shape of
+ * pixel_count*1 if @direction is horizontal.
+ * @src_x_start: x start coordinate for the line reading
+ * @src_y_start: y start coordinate for the line reading
+ * @dst_x_start: x coordinate to blend the read line
+ * @pixel_count: number of pixels to blend
+ *
+ * This function is mainly a safety net to avoid reading outside the source buffer. As the
+ * userspace should never ask to read outside the source plane, all the cases covered here should
+ * be dead code.
+ */
+static void clamp_line_coordinates(enum pixel_read_direction direction,
+				   const struct vkms_plane_state *current_plane,
+				   const struct drm_rect *src_line, int *src_x_start,
+				   int *src_y_start, int *dst_x_start, int *pixel_count)
+{
+	/* By default the start points are correct */
+	*src_x_start = src_line->x1;
+	*src_y_start = src_line->y1;
+	*dst_x_start = current_plane->frame_info->dst.x1;
+
+	/* Get the correct number of pixel to blend, it depends of the direction */
+	switch (direction) {
+	case READ_LEFT_TO_RIGHT:
+	case READ_RIGHT_TO_LEFT:
+		*pixel_count = drm_rect_width(src_line);
+		break;
+	case READ_BOTTOM_TO_TOP:
+	case READ_TOP_TO_BOTTOM:
+		*pixel_count = drm_rect_height(src_line);
+		break;
+	}
+
+	/*
+	 * Clamp the coordinates to avoid reading outside the buffer
+	 *
+	 * This is mainly a security check to avoid reading outside the buffer, the userspace
+	 * should never request to read outside the source buffer.
+	 */
+	switch (direction) {
+	case READ_LEFT_TO_RIGHT:
+	case READ_RIGHT_TO_LEFT:
+		if (*src_x_start < 0) {
+			*pixel_count += *src_x_start;
+			*dst_x_start -= *src_x_start;
+			*src_x_start = 0;
+		}
+		if (*src_x_start + *pixel_count > current_plane->frame_info->fb->width)
+			*pixel_count = max(0, (int)current_plane->frame_info->fb->width -
+				*src_x_start);
+		break;
+	case READ_BOTTOM_TO_TOP:
+	case READ_TOP_TO_BOTTOM:
+		if (*src_y_start < 0) {
+			*pixel_count += *src_y_start;
+			*dst_x_start -= *src_y_start;
+			*src_y_start = 0;
+		}
+		if (*src_y_start + *pixel_count > current_plane->frame_info->fb->height)
+			*pixel_count = max(0, (int)current_plane->frame_info->fb->height -
+				*src_y_start);
+		break;
+	}
+}
+
+/**
+ * blend_line() - Blend a line from a plane to the output buffer
+ *
+ * @current_plane: current plane to work on
+ * @y: line to write in the output buffer
+ * @crtc_x_limit: width of the output buffer
+ * @stage_buffer: temporary buffer to convert the pixel line from the source buffer
+ * @output_buffer: buffer to blend the read line into.
+ */
+static void blend_line(struct vkms_plane_state *current_plane, int y,
+		       int crtc_x_limit, struct line_buffer *stage_buffer,
+		       struct line_buffer *output_buffer)
+{
+	int src_x_start, src_y_start, dst_x_start, pixel_count;
+	struct drm_rect dst_line, tmp_src, src_line;
+
+	/* Avoid rendering useless lines */
+	if (y < current_plane->frame_info->dst.y1 ||
+	    y >= current_plane->frame_info->dst.y2)
+		return;
+
+	/*
+	 * dst_line is the line to copy. The initial coordinates are inside the
+	 * destination framebuffer, and then drm_rect_* helpers are used to
+	 * compute the correct position into the source framebuffer.
+	 */
+	dst_line = DRM_RECT_INIT(current_plane->frame_info->dst.x1, y,
+				 drm_rect_width(&current_plane->frame_info->dst),
+				 1);
+
+	drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
+
+	/*
+	 * [1]: Clamping src_line to the crtc_x_limit to avoid writing outside of
+	 * the destination buffer
+	 */
+	dst_line.x1 = max_t(int, dst_line.x1, 0);
+	dst_line.x2 = min_t(int, dst_line.x2, crtc_x_limit);
+	/* The destination is completely outside of the crtc. */
+	if (dst_line.x2 <= dst_line.x1)
+		return;
+
+	src_line = dst_line;
+
+	/*
+	 * Transform the coordinate x/y from the crtc to coordinates into
+	 * coordinates for the src buffer.
+	 *
+	 * - Cancel the offset of the dst buffer.
+	 * - Invert the rotation. This assumes that
+	 *   dst = drm_rect_rotate(src, rotation) (dst and src have the
+	 *   same size, but can be rotated).
+	 * - Apply the offset of the source rectangle to the coordinate.
+	 */
+	drm_rect_translate(&src_line, -current_plane->frame_info->dst.x1,
+			   -current_plane->frame_info->dst.y1);
+	drm_rect_rotate_inv(&src_line, drm_rect_width(&tmp_src),
+			    drm_rect_height(&tmp_src),
+			    current_plane->frame_info->rotation);
+	drm_rect_translate(&src_line, tmp_src.x1, tmp_src.y1);
+
+	/* Get the correct reading direction in the source buffer. */
+
+	enum pixel_read_direction direction =
+		direction_for_rotation(current_plane->frame_info->rotation);
+
+	/* [2]: Compute and clamp the number of pixel to read */
+	clamp_line_coordinates(direction, current_plane, &src_line, &src_x_start, &src_y_start,
+			       &dst_x_start, &pixel_count);
+
+	if (pixel_count <= 0) {
+		/* Nothing to read, so avoid multiple function calls */
+		return;
+	}
+
+	/*
+	 * Modify the starting point to take in account the rotation
+	 *
+	 * src_line is the top-left corner, so when reading READ_RIGHT_TO_LEFT or
+	 * READ_BOTTOM_TO_TOP, it must be changed to the top-right/bottom-left
+	 * corner.
+	 */
+	if (direction == READ_RIGHT_TO_LEFT) {
+		// src_x_start is now the right point
+		src_x_start += pixel_count - 1;
+	} else if (direction == READ_BOTTOM_TO_TOP) {
+		// src_y_start is now the bottom point
+		src_y_start += pixel_count - 1;
+	}
+
+	/*
+	 * Perform the conversion and the blending
+	 *
+	 * Here we know that the read line (x_start, y_start, pixel_count) is
+	 * inside the source buffer [2] and we don't write outside the stage
+	 * buffer [1].
+	 */
+	current_plane->pixel_read_line(current_plane, src_x_start, src_y_start, direction,
+				       pixel_count, &stage_buffer->pixels[dst_x_start]);
+
+	pre_mul_alpha_blend(stage_buffer, output_buffer,
+			    dst_x_start, pixel_count);
+}
+
 /**
  * blend - blend the pixels from all planes and compute crc
  * @wb: The writeback frame buffer metadata
@@ -223,34 +372,25 @@ static void blend(struct vkms_writeback_job *wb,
 {
 	struct vkms_plane_state **plane = crtc_state->active_planes;
 	u32 n_active_planes = crtc_state->num_active_planes;
-	int y_pos, x_dst, pixel_count;
 
 	const struct pixel_argb_u16 background_color = { .a = 0xffff };
 
-	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
+	int crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
+	int crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
 
 	/*
 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
 	 * complexity to avoid poor blending performance.
 	 *
-	 * The function vkms_compose_row() is used to read a line, pixel-by-pixel, into the staging
-	 * buffer.
+	 * The function pixel_read_line callback is used to read a line, using an efficient
+	 * algorithm for a specific format, into the staging buffer.
 	 */
-	for (size_t y = 0; y < crtc_y_limit; y++) {
+	for (int y = 0; y < crtc_y_limit; y++) {
 		fill_background(&background_color, output_buffer);
 
 		/* The active planes are composed associatively in z-order. */
 		for (size_t i = 0; i < n_active_planes; i++) {
-			x_dst = plane[i]->frame_info->dst.x1;
-			pixel_count = min_t(int, drm_rect_width(&plane[i]->frame_info->dst),
-					    (int)stage_buffer->n_pixels);
-			y_pos = get_y_pos(plane[i]->frame_info, y);
-
-			if (!check_limit(plane[i]->frame_info, y_pos))
-				continue;
-
-			vkms_compose_row(stage_buffer, plane[i], y_pos);
-			pre_mul_alpha_blend(stage_buffer, output_buffer, x_dst, pixel_count);
+			blend_line(plane[i], y, crtc_x_limit, stage_buffer, output_buffer);
 		}
 
 		apply_lut(crtc_state, output_buffer);
@@ -258,7 +398,7 @@ static void blend(struct vkms_writeback_job *wb,
 		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
 
 		if (wb)
-			vkms_writeback_row(wb, output_buffer, y_pos);
+			vkms_writeback_row(wb, output_buffer, y);
 	}
 }
 
@@ -269,7 +409,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state,
 	u32 n_active_planes = crtc_state->num_active_planes;
 
 	for (size_t i = 0; i < n_active_planes; i++)
-		if (!planes[i]->pixel_read)
+		if (!planes[i]->pixel_read_line)
 			return -1;
 
 	if (active_wb && !active_wb->pixel_write)
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 777b7bd91f27..067a4797f7a0 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -39,7 +39,6 @@
 struct vkms_frame_info {
 	struct drm_framebuffer *fb;
 	struct drm_rect src, dst;
-	struct drm_rect rotated;
 	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
 	unsigned int rotation;
 };
@@ -80,26 +79,38 @@ enum pixel_read_direction {
 	READ_LEFT_TO_RIGHT
 };
 
+struct vkms_plane_state;
+
 /**
- * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
+ * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
  * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
  *
- * @in_pixel: pointer to the pixel to read
- * @out_pixel: pointer to write the converted pixel
+ * @plane: plane used as source for the pixel value
+ * @x_start: X (width) coordinate of the first pixel to copy. The caller must ensure that x_start
+ * is non-negative and smaller than @plane->frame_info->fb->width.
+ * @y_start: Y (height) coordinate of the first pixel to copy. The caller must ensure that y_start
+ * is non-negative and smaller than @plane->frame_info->fb->height.
+ * @direction: direction to use for the copy, starting at @x_start/@y_start
+ * @count: number of pixels to copy
+ * @out_pixel: pointer where to write the pixel values. They will be written from @out_pixel[0]
+ * (included) to @out_pixel[@count] (excluded). The caller must ensure that out_pixel have a
+ * length of at least @count.
  */
-typedef void (*pixel_read_t)(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
+typedef void (*pixel_read_line_t)(const struct vkms_plane_state *plane, int x_start,
+				  int y_start, enum pixel_read_direction direction, int count,
+				  struct pixel_argb_u16 out_pixel[]);
 
 /**
  * struct vkms_plane_state - Driver specific plane state
  * @base: base plane state
  * @frame_info: data required for composing computation
- * @pixel_read: function to read a pixel in this plane. The creator of a struct vkms_plane_state
- *	        must ensure that this pointer is valid
+ * @pixel_read_line: function to read a pixel line in this plane. The creator of a
+ *		     struct vkms_plane_state must ensure that this pointer is valid
  */
 struct vkms_plane_state {
 	struct drm_shadow_plane_state base;
 	struct vkms_frame_info *frame_info;
-	pixel_read_t pixel_read;
+	pixel_read_line_t pixel_read_line;
 };
 
 struct vkms_plane {
@@ -265,7 +276,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
 /* Composer Support */
 void vkms_composer_worker(struct work_struct *work);
 void vkms_set_composer(struct vkms_output *out, bool enabled);
-void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
 void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);
 
 /* Writeback */
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index d0e7dfc1f0d3..0f6678420a11 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -140,83 +140,51 @@ static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
 	*addr = (u8 *)frame_info->map[0].vaddr + offset;
 }
 
-static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
-				 int plane_index)
-{
-	int x_src = frame_info->src.x1 >> 16;
-	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
-	u8 *addr;
-	int rem_x, rem_y;
-
-	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
-		  "%s() only support formats with block_w == 1", __func__);
-	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
-		  "%s() only support formats with block_h == 1", __func__);
-
-	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
-
-	return addr;
-}
-
-static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
-{
-	if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
-		return limit - x - 1;
-	return x;
-}
-
 /*
- * The following functions take pixel data from the buffer and convert them to the format
- * ARGB16161616 in @out_pixel.
+ * The following functions take pixel data (a, r, g, b, pixel, ...) and convert them to
+ * &struct pixel_argb_u16
  *
- * They are used in the vkms_compose_row() function to handle multiple formats.
+ * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
  */
 
-static void ARGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static struct pixel_argb_u16 argb_u16_from_u8888(u8 a, u8 r, u8 g, u8 b)
 {
+	struct pixel_argb_u16 out_pixel;
 	/*
 	 * The 257 is the "conversion ratio". This number is obtained by the
 	 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
 	 * the best color value in a pixel format with more possibilities.
 	 * A similar idea applies to others RGB color conversions.
 	 */
-	out_pixel->a = (u16)in_pixel[3] * 257;
-	out_pixel->r = (u16)in_pixel[2] * 257;
-	out_pixel->g = (u16)in_pixel[1] * 257;
-	out_pixel->b = (u16)in_pixel[0] * 257;
-}
+	out_pixel.a = (u16)a * 257;
+	out_pixel.r = (u16)r * 257;
+	out_pixel.g = (u16)g * 257;
+	out_pixel.b = (u16)b * 257;
 
-static void XRGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
-{
-	out_pixel->a = (u16)0xffff;
-	out_pixel->r = (u16)in_pixel[2] * 257;
-	out_pixel->g = (u16)in_pixel[1] * 257;
-	out_pixel->b = (u16)in_pixel[0] * 257;
+	return out_pixel;
 }
 
-static void ARGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static struct pixel_argb_u16 argb_u16_from_u16161616(u16 a, u16 r, u16 g, u16 b)
 {
-	__le16 *pixel = (__le16 *)in_pixel;
+	struct pixel_argb_u16 out_pixel;
+
+	out_pixel.a = a;
+	out_pixel.r = r;
+	out_pixel.g = g;
+	out_pixel.b = b;
 
-	out_pixel->a = le16_to_cpu(pixel[3]);
-	out_pixel->r = le16_to_cpu(pixel[2]);
-	out_pixel->g = le16_to_cpu(pixel[1]);
-	out_pixel->b = le16_to_cpu(pixel[0]);
+	return out_pixel;
 }
 
-static void XRGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static struct pixel_argb_u16 argb_u16_from_le16161616(__le16 a, __le16 r, __le16 g, __le16 b)
 {
-	__le16 *pixel = (__le16 *)in_pixel;
-
-	out_pixel->a = (u16)0xffff;
-	out_pixel->r = le16_to_cpu(pixel[2]);
-	out_pixel->g = le16_to_cpu(pixel[1]);
-	out_pixel->b = le16_to_cpu(pixel[0]);
+	return argb_u16_from_u16161616(le16_to_cpu(a), le16_to_cpu(r), le16_to_cpu(g),
+				       le16_to_cpu(b));
 }
 
-static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
+static struct pixel_argb_u16 argb_u16_from_RGB565(const __le16 *pixel)
 {
-	__le16 *pixel = (__le16 *)in_pixel;
+	struct pixel_argb_u16 out_pixel;
 
 	s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
 	s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
@@ -226,40 +194,120 @@ static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pi
 	s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
 	s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
 
-	out_pixel->a = (u16)0xffff;
-	out_pixel->r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio));
-	out_pixel->g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio));
-	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
+	out_pixel.a = (u16)0xffff;
+	out_pixel.r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio));
+	out_pixel.g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio));
+	out_pixel.b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
+
+	return out_pixel;
 }
 
-/**
- * vkms_compose_row - compose a single row of a plane
- * @stage_buffer: output line with the composed pixels
- * @plane: state of the plane that is being composed
- * @y: y coordinate of the row
+/*
+ * The following functions are read_line function for each pixel format supported by VKMS.
+ *
+ * They read a line starting at the point @x_start,@y_start following the @direction. The result
+ * is stored in @out_pixel and in the format ARGB16161616.
+ *
+ * These functions are very repetitive, but the innermost pixel loops must be kept inside these
+ * functions for performance reasons. Some benchmarking was done in [1] where having the innermost
+ * loop factored out of these functions showed a slowdown by a factor of three.
  *
- * This function composes a single row of a plane. It gets the source pixels
- * through the y coordinate (see get_packed_src_addr()) and goes linearly
- * through the source pixel, reading the pixels and converting it to
- * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
- * the source pixels are not traversed linearly. The source pixels are queried
- * on each iteration in order to traverse the pixels vertically.
+ * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
  */
-void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
+
+static void ARGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start,
+			       enum pixel_read_direction direction, int count,
+			       struct pixel_argb_u16 out_pixel[])
 {
-	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
-	struct vkms_frame_info *frame_info = plane->frame_info;
-	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
-	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
+	struct pixel_argb_u16 *end = out_pixel + count;
+	u8 *src_pixels;
+
+	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
+
+	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
+
+	while (out_pixel < end) {
+		u8 *px = (u8 *)src_pixels;
+		*out_pixel = argb_u16_from_u8888(px[3], px[2], px[1], px[0]);
+		out_pixel += 1;
+		src_pixels += step;
+	}
+}
+
+static void XRGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start,
+			       enum pixel_read_direction direction, int count,
+			       struct pixel_argb_u16 out_pixel[])
+{
+	struct pixel_argb_u16 *end = out_pixel + count;
+	u8 *src_pixels;
+
+	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
+
+	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
+
+	while (out_pixel < end) {
+		u8 *px = (u8 *)src_pixels;
+		*out_pixel = argb_u16_from_u8888(255, px[2], px[1], px[0]);
+		out_pixel += 1;
+		src_pixels += step;
+	}
+}
+
+static void ARGB16161616_read_line(const struct vkms_plane_state *plane, int x_start,
+				   int y_start, enum pixel_read_direction direction, int count,
+				   struct pixel_argb_u16 out_pixel[])
+{
+	struct pixel_argb_u16 *end = out_pixel + count;
+	u8 *src_pixels;
+
+	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
+
+	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
+
+	while (out_pixel < end) {
+		u16 *px = (u16 *)src_pixels;
+		*out_pixel = argb_u16_from_u16161616(px[3], px[2], px[1], px[0]);
+		out_pixel += 1;
+		src_pixels += step;
+	}
+}
+
+static void XRGB16161616_read_line(const struct vkms_plane_state *plane, int x_start,
+				   int y_start, enum pixel_read_direction direction, int count,
+				   struct pixel_argb_u16 out_pixel[])
+{
+	struct pixel_argb_u16 *end = out_pixel + count;
+	u8 *src_pixels;
+
+	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
+
+	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
+
+	while (out_pixel < end) {
+		__le16 *px = (__le16 *)src_pixels;
+		*out_pixel = argb_u16_from_le16161616(cpu_to_le16(0xFFFF), px[2], px[1], px[0]);
+		out_pixel += 1;
+		src_pixels += step;
+	}
+}
+
+static void RGB565_read_line(const struct vkms_plane_state *plane, int x_start,
+			     int y_start, enum pixel_read_direction direction, int count,
+			     struct pixel_argb_u16 out_pixel[])
+{
+	struct pixel_argb_u16 *end = out_pixel + count;
+	u8 *src_pixels;
+
+	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
 
-	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
-		int x_pos = get_x_position(frame_info, limit, x);
+	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
 
-		if (drm_rotation_90_or_270(frame_info->rotation))
-			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
-				+ frame_info->fb->format->cpp[0] * y;
+	while (out_pixel < end) {
+		__le16 *px = (__le16 *)src_pixels;
 
-		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
+		*out_pixel = argb_u16_from_RGB565(px);
+		out_pixel += 1;
+		src_pixels += step;
 	}
 }
 
@@ -359,25 +407,25 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
 }
 
 /**
- * get_pixel_read_function() - Retrieve the correct read_pixel function for a specific
+ * get_pixel_read_line_function() - Retrieve the correct read_line function for a specific
  * format. The returned pointer is NULL for unsupported pixel formats. The caller must ensure that
  * the pointer is valid before using it in a vkms_plane_state.
  *
  * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h])
  */
-pixel_read_t get_pixel_read_function(u32 format)
+pixel_read_line_t get_pixel_read_line_function(u32 format)
 {
 	switch (format) {
 	case DRM_FORMAT_ARGB8888:
-		return &ARGB8888_to_argb_u16;
+		return &ARGB8888_read_line;
 	case DRM_FORMAT_XRGB8888:
-		return &XRGB8888_to_argb_u16;
+		return &XRGB8888_read_line;
 	case DRM_FORMAT_ARGB16161616:
-		return &ARGB16161616_to_argb_u16;
+		return &ARGB16161616_read_line;
 	case DRM_FORMAT_XRGB16161616:
-		return &XRGB16161616_to_argb_u16;
+		return &XRGB16161616_read_line;
 	case DRM_FORMAT_RGB565:
-		return &RGB565_to_argb_u16;
+		return &RGB565_read_line;
 	default:
 		/*
 		 * This is a bug in vkms_plane_atomic_check(). All the supported
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
index 3ecea4563254..8d2bef95ff79 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.h
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -5,7 +5,7 @@
 
 #include "vkms_drv.h"
 
-pixel_read_t get_pixel_read_function(u32 format);
+pixel_read_line_t get_pixel_read_line_function(u32 format);
 
 pixel_write_t get_pixel_write_function(u32 format);
 
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 10e9b23dab28..8875bed76410 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -112,7 +112,6 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	frame_info = vkms_plane_state->frame_info;
 	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
 	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
-	memcpy(&frame_info->rotated, &new_state->dst, sizeof(struct drm_rect));
 	frame_info->fb = fb;
 	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
 	drm_framebuffer_get(frame_info->fb);
@@ -122,10 +121,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 									  DRM_MODE_REFLECT_X |
 									  DRM_MODE_REFLECT_Y);
 
-	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
-			drm_rect_height(&frame_info->rotated), frame_info->rotation);
 
-	vkms_plane_state->pixel_read = get_pixel_read_function(fmt);
+	vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt);
 }
 
 static int vkms_plane_atomic_check(struct drm_plane *plane,

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 9/9] drm/vkms: Remove useless drm_rotation_simplify
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (7 preceding siblings ...)
  2024-10-31 17:53 ` [PATCH v13 8/9] drm/vkms: Re-introduce line-per-line composition algorithm Louis Chauvet
@ 2024-10-31 17:53 ` Louis Chauvet
  2024-11-18 17:10 ` [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading José Expósito
  9 siblings, 0 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-10-31 17:53 UTC (permalink / raw)
  To: Rodrigo Siqueira, Melissa Wen, Maíra Canal, Haneen Mohammed,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Jonathan Corbet, Louis Chauvet, Simona Vetter, Helen Koike,
	rdunlap, arthurgrillo, pekka.paalanen, Simona Vetter
  Cc: dri-devel, linux-kernel, linux-doc, thomas.petazzoni,
	jeremie.dautheribes, miquel.raynal, seanpaul, marcheu,
	nicolejadeyee

As all the rotation are now supported by VKMS, this simplification does
not make sense anymore, so remove it.

Acked-by: Maíra Canal <mairacanal@riseup.net>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
---
 drivers/gpu/drm/vkms/vkms_plane.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 8875bed76410..5a028ee96c91 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -115,12 +115,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
 	frame_info->fb = fb;
 	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
 	drm_framebuffer_get(frame_info->fb);
-	frame_info->rotation = drm_rotation_simplify(new_state->rotation, DRM_MODE_ROTATE_0 |
-									  DRM_MODE_ROTATE_90 |
-									  DRM_MODE_ROTATE_270 |
-									  DRM_MODE_REFLECT_X |
-									  DRM_MODE_REFLECT_Y);
-
+	frame_info->rotation = new_state->rotation;
 
 	vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt);
 }

-- 
2.46.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading
  2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
                   ` (8 preceding siblings ...)
  2024-10-31 17:53 ` [PATCH v13 9/9] drm/vkms: Remove useless drm_rotation_simplify Louis Chauvet
@ 2024-11-18 17:10 ` José Expósito
  2024-11-18 17:13   ` Louis Chauvet
  9 siblings, 1 reply; 20+ messages in thread
From: José Expósito @ 2024-11-18 17:10 UTC (permalink / raw)
  To: louis.chauvet
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, mcanal, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen,
	pekka.paalanen, rdunlap, rodrigosiqueiramelo, seanpaul,
	simona.vetter, simona, thomas.petazzoni, tzimmermann,
	José Expósito

Hi Louis,

> This patchset is the second version of [1]. It is almost a complete
> rewrite to use a line-by-line algorithm for the composition.
> 
> It can be divided in multiple parts:
> - PATCH 1 to 3: no functional change is intended, only some formatting and
>   documenting (PATCH 2 is taken from [2])
> - PATCH 4 to 7: Some preparation work not directly related to the
>   line-by-line algorithm
> - PATCH 8: main patch for this series, it reintroduce the
>   line-by-line algorithm
> - PATCH 9: Remove useless drm_simplify_rotation
> - Rest of the series: moved to a new series to merge this one, see the new 
>   series "Add YUV ad R1..8 formats support to VKMS"
> 
> The PATCH 8 aims to restore the line-by-line pixel reading algorithm. It
> was introduced in 8ba1648567e2 ("drm: vkms: Refactor the plane composer to
> accept new formats") but removed in 8ba1648567e2 ("drm: vkms: Refactor the
> plane composer to accept new formats") in a over-simplification effort.
> At this time, nobody noticed the performance impact of this commit. After
> the first iteration of my series, poeple notice performance impact, and it
> was the case. Pekka suggested to reimplement the line-by-line algorithm.
> 
> Expiriments on my side shown great improvement for the line-by-line
> algorithm, and the performances are the same as the original line-by-line
> algorithm. I targeted my effort to make the code working for all the
> rotations and translations. The usage of helpers from drm_rect_* avoid
> reimplementing existing logic.
> 
> The only "complex" part remaining is the clipping of the coordinate to
> avoid reading/writing outside of src/dst. Thus I added a lot of comments
> to help when someone will want to add some features (framebuffer resizing
> for example).
> 
> I did not changed any expected test results as VKMS seems to have some 
> existing issue:
> https://gitlab.freedesktop.org/jim.cromie/kernel-drm-next-dd/-/jobs/61484201
> https://gitlab.freedesktop.org/jim.cromie/kernel-drm-next-dd/-/jobs/61803193
> https://gitlab.freedesktop.org/louischauvet/kernel/-/jobs/65944002
> 
> To properly test the rotation algorithm, I had to implement a new IGT
> test [8]. This helped to found one issue in the YUV rotation algortihm.
> 
> My series was mainly tested with:
> - kms_plane (for color conversions)
> - kms_rotation_crc (for a subset of rotation and formats)
> - kms_rotation (to test all rotation and formats combinations) [8]
> - kms_cursor_crc (for translations)
> The benchmark used to measure the improvment was done with
> kms_fb_stress [10] with some modifications:
> - Fixing the writeback format to XRGB8888
> - Using a primary plane with odd dimension to avoid failures due to YUV
>   alignment
> The KMS structure was:
> 	CRTC:
> 		rectangle: 4096x2160+0+0
> 	primary:
> 		format: ABGR16161616
> 		rectangle: 3640x2160+101+0
> 	writeback:
> 		format: XRGB8888
> 		rectangle: 4096x2160+0+0
> Results (on my computer):
> 
> 8356b9790650: drm/test: Add test cases for drm_rect_rotate_inv() (before any regression)
> 322d716a3e8a: drm/vkms: isolate pixel conversion functionality (first regression)
> cc4fd2934d41: drm/vkms: Isolate writeback pixel conversion functions (second regression)
> 2c3d1bd284c5: drm/panel: simple: Add Microtips Technology MF-103HIEB0GA0 panel (current drm-misc-next)
> 
>  Used format  | This series | 2c3d1bd284c5 | cc4fd2934d41 | 322d716a3e8a | 8356b9790650 |
> --------------+-------------+--------------+--------------+--------------+--------------+
>  XRGB8888     |  13.261666s |   14.289582s |   10.731272s |    9.480001s |    9.277507s |
>  XRGB16161616 |  13.282479s |   13.918926s |   10.712616s |    9.776903s |    9.291766s |
>  RGB565       | 136.154163s |  141.646489s |  101.744050s |  103.712164s |   87.860923s |
> 
> This is a 5-10% improvment of the performance. More work need to be done
> on the writeback to gain more.
> 
> [1]: https://lore.kernel.org/all/20240201-yuv-v1-0-3ca376f27632@bootlin.com
> [2]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-0-952fcaa5a193@riseup.net/
> [3]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-3-952fcaa5a193@riseup.net/
> [4]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-5-952fcaa5a193@riseup.net/
> [5]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-6-952fcaa5a193@riseup.net/
> [6]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-7-952fcaa5a193@riseup.net/
> [8]: https://lore.kernel.org/r/20240313-new_rotation-v2-0-6230fd5cae59@bootlin.com
> [9]: https://lore.kernel.org/dri-devel/20240306-louis-vkms-conv-v1-1-5bfe7d129fdd@riseup.net/
> [10]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc88@riseup.net/
> 
> To: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
> To: Melissa Wen <melissa.srw@gmail.com>
> To: Maíra Canal <mairacanal@riseup.net>
> To: Haneen Mohammed <hamohammed.sa@gmail.com>
> To: Daniel Vetter <daniel@ffwll.ch>
> To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> To: Maxime Ripard <mripard@kernel.org>
> To: Thomas Zimmermann <tzimmermann@suse.de>
> To: David Airlie <airlied@gmail.com>
> To: rdunlap@infradead.org
> To: arthurgrillo@riseup.net
> To: Jonathan Corbet <corbet@lwn.net>
> To: pekka.paalanen@haloniitty.fi
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-kernel@vger.kernel.org
> Cc: jeremie.dautheribes@bootlin.com
> Cc: miquel.raynal@bootlin.com
> Cc: thomas.petazzoni@bootlin.com
> Cc: seanpaul@google.com
> Cc: marcheu@google.com
> Cc: nicolejadeyee@google.com
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>

Thanks for working on this. I reviewed the series and, with the exception of
a couple of *very* minor comments, it looks good to me.

Feel free to add to the entire series:
Reviewed-by: José Expósito <jose.exposito89@gmail.com>

> Changes in v13:
> - Removed the YUV part to prepare the merge
> - Add Acked-by from Maíra
> - Link to v12: https://lore.kernel.org/r/20241007-yuv-v12-0-01c1ada6fec8@bootlin.com
> Changes in v12:
> - Fix documentation issues as suggested by Randy
> - Link to v11: https://lore.kernel.org/r/20240930-yuv-v11-0-4b1a26bcfc96@bootlin.com
> Changes in v11:
> - Remove documentation patch (already merged)
> - Fix sparse warning about documentation
> - Link to v10: https://lore.kernel.org/r/20240809-yuv-v10-0-1a7c764166f7@bootlin.com
> Changes in v10:
> - Properly remove the patch introducing dummy read/write functions
> - PATCH 8/16: Format fixups
> - PATCH 9/16: Format fixups
> - PATCH 11/16: Format fixups
> - PATCH 14/16: Fix test compilation, add module description
> - Link to v9: https://lore.kernel.org/r/20240802-yuv-v9-0-08a706669e16@bootlin.com
> Changes in v9:
> - PATCH 3/17: Fix docs as Maíra suggested
> - PATCH 4,6,10,12,15,17/17: Fix sparse warning about __le16 casting
> - Link to v8: https://lore.kernel.org/all/20240516-yuv-v8-0-cf8d6f86430e@bootlin.com/
> Changes in v8:
> - PATCH 7/17: Update pitch access to use the proper value for block
>   formats
> - PATCH 9/17: Update pitch access to use the proper value for block
>   formats
> - Link to v7: https://lore.kernel.org/r/20240513-yuv-v7-0-380e9ffec502@bootlin.com
> Changes in v7:
> - Some typos and indent fixes
> - Add Review-By, Acked-By
> - PATCH 3/17: Clarify src/dst unit
> - PATCH 9/17: Clarify documentation
> - PATCH 9/17: Restrict conditions for direction
> - PATCH 9/17: Rename get_block_step_byte to get_block_step_bytes
> - PATCH 10/17: Clarify kernel doc for clamp_line_coordinates, blend_line,
>   pixel_read_line_t
> - PATCH 10/17: Fix the case when src_*_start >= fb->width/height
> - PATCH 10/17: Change y in blend to be an int
> - PATCH 10/17: Clarify documentation for read functions
> - PATCH 12/17: Fix the type of rgb variables in argb_u16_from_yuv888
> - PATCH 12/17: Move comments at the right place, remove useless ones
> - PATCH 12/17: Add missing const
> - PATCH 17/17: Use drm_format_info_bpp and computation to avoid hard-coded
>   values
> - Link to v6: https://lore.kernel.org/r/20240409-yuv-v6-0-de1c5728fd70@bootlin.com
> Changes in v6:
> - Add Randy
> - Add Review-By and Acked-By
> - PATCH 2/17: Remove useless newline
> - PATCH 3/17: Fix kernel doc
> - PATCH 4/17: Fix typo in git commit
> - PATCH 4/17: Fix kernel doc and simplify brief description of typedef
> - PATCH 5/17: Change black default color to Magenta
> - PATCH 5/17: Fix wording in comment
> - PATCH 7/17: Fix typo in packed_pixel_offset
> - PATCH 7/17: Add WARN_ON for currently not supported formats
> - PATCH 8/17: Rename x_limit to pixel_count
> - PATCH 8/17: Clarify kernel doc for pre_mul_alpha_blend
> - PATCH 9/17: Rename get_step_next_block to get_block_step_bytes
> - PATCH 9/17: Change kernel doc order
> - PATCH 9/17: Rework the direction_for_rotation function to use drm
>   helpers
> - PATCH 9/17: Add a warn in direction_for_rotation if the result is not
>   expected
> - PATCH 10/17: Reword the comment of pixel color conversion functions
> - PATCH 10/17: Refactor the blending function to extract functions
> - PATCH 11/17: Remove useless drm_rotation_simplify
> - PATCH 12/17: Fix typo in comments
> - PATCH 12/17: Remove useless define
> - PATCH 12/17: Fix some comments typo and kernel doc
> - PATCH 12/17: Add a comma at the end of the vkms_formats list
> - PATCH 12/17: Use copy of matrix instead of pointers
> - PATCH 12/17: Use 16 bit range for yuv conversion
> - PATCH 17/17: Add a comma at the end of the vkms_formats list
> - PATCH 17/17: Add assertions
> - PATCH 17/17: Fix color conversion... Next time I will read the doc
>   twice...
> - Link to v5: https://lore.kernel.org/r/20240313-yuv-v5-0-e610cbd03f52@bootlin.com
> Changes in v5:
> - All patches: fix some formatting issues
> - PATCH 4/16: Use the correct formatter for 4cc code
> - PATCH 7/16: Update the pixel accessors to also return the pixel position
>   inside a block.
> - PATCH 8/16: Fix a temporary bug
> - PATCH 9/16: Update the get_step_1x1 to get_step_next_block and update
>   the documentation
> - PATCH 10/16: Update to uses the new pixel accessors
> - PATCH 10/16: Reword some comments
> - PATCH 11/16: Update to use the new pixel accessors
> - PATCH 11/16: Fix a bug in the subsampling offset for inverted reading
>   (right to left/bottom to top). Found by [8].
> - PATCH 11/16: Apply Arthur's modifications (comments, algorithm
>   clarification)
> - PATCH 11/16: Use the correct formatter for 4cc code
> - PATCH 11/16: Update to use the new get_step_next_block
> - PATCH 14/16: Apply Arthur's modification (comments, compilation issue)
> - PATCH 15/16: Add Arthur's patch to explain the kunit tests
> - PATCH 16/16: Introduce DRM_FORMAT_R* support.
> - Link to v4: https://lore.kernel.org/r/20240304-yuv-v4-0-76beac8e9793@bootlin.com
> Changes in v4:
> - PATCH 3/14: Update comments for get_pixel_* functions
> - PATCH 4/14: Add WARN when trying to get unsupported pixel_* functions
> - PATCH 5/14: Create dummy pixel reader/writer to avoid NULL
>   function pointers and kernel OOPS
> - PATCH 6/14: Added the usage of const pointers when needed
> - PATCH 7/14: Extraction of pixel accessors modification
> - PATCH 8/14: Extraction of the blending function modification
> - PATCH 9/14: Extraction of the pixel_read_direction enum
> - PATCH 10/14: Update direction_for_rotation documentation
> - PATCH 10/14: Rename conversion functions to be explicit
> - PATCH 10/14: Replace while(count) by while(out_pixel<end) in read_line
>   callbacks. It avoid a new variable+addition in the composition hot path.
> - PATCH 11/14: Rename conversion functions to be explicit
> - PATCH 11/14: Update the documentation for get_subsampling_offset
> - PATCH 11/14: Add the matrix_conversion structure to remove a test from
>   the hot path.
> - PATCH 11/14: Upadate matrix values to use 32.32 fixed floats for
>   conversion
> - PATCH 12/14: Update commit message
> - PATCH 14/14: Change kunit expected value
> - Link to v3: https://lore.kernel.org/r/20240226-yuv-v3-0-ff662f0994db@bootlin.com
> Changes in v3:
> - Correction of remaining git-rebase artefacts
> - Added Pekka in copy of this patch
> - Link to v2: https://lore.kernel.org/r/20240223-yuv-v2-0-aa6be2827bb7@bootlin.com
> Changes in v2:
> - Rebased the series on top of drm-misc/drm-misc-net
> - Extract the typedef for pixel_read/pixel_write
> - Introduce the line-by-line algorithm per pixel format
> - Add some documentation for existing and new code
> - Port the series [1] to use line-by-line algorithm
> - Link to v1: https://lore.kernel.org/r/20240201-yuv-v1-0-3ca376f27632@bootlin.com
> ---
> Arthur Grillo (1):
>       drm/vkms: Use drm_frame directly
> 
> Louis Chauvet (8):
>       drm/vkms: Code formatting
>       drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
>       drm/vkms: Use const for input pointers in pixel_read an pixel_write functions
>       drm/vkms: Update pixels accessor to support packed and multi-plane formats.
>       drm/vkms: Avoid computing blending limits inside pre_mul_alpha_blend
>       drm/vkms: Introduce pixel_read_direction enum
>       drm/vkms: Re-introduce line-per-line composition algorithm
>       drm/vkms: Remove useless drm_rotation_simplify
> 
>  drivers/gpu/drm/vkms/vkms_composer.c  | 312 ++++++++++++++++++++------
>  drivers/gpu/drm/vkms/vkms_crtc.c      |   6 +-
>  drivers/gpu/drm/vkms/vkms_drv.c       |   3 +-
>  drivers/gpu/drm/vkms/vkms_drv.h       |  55 ++++-
>  drivers/gpu/drm/vkms/vkms_formats.c   | 409 ++++++++++++++++++++++++----------
>  drivers/gpu/drm/vkms/vkms_formats.h   |   4 +-
>  drivers/gpu/drm/vkms/vkms_plane.c     |  17 +-
>  drivers/gpu/drm/vkms/vkms_writeback.c |   5 -
>  8 files changed, 588 insertions(+), 223 deletions(-)
> ---
> base-commit: 623b1e4d2eace0958996995f9f88cb659a6f69dd
> change-id: 20240201-yuv-1337d90d9576
> 
> Best regards,
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats.
  2024-10-31 17:53 ` [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats Louis Chauvet
@ 2024-11-18 17:10   ` José Expósito
  2024-11-18 17:17     ` Louis Chauvet
  0 siblings, 1 reply; 20+ messages in thread
From: José Expósito @ 2024-11-18 17:10 UTC (permalink / raw)
  To: louis.chauvet
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen, rdunlap,
	rodrigosiqueiramelo, seanpaul, simona.vetter, simona,
	thomas.petazzoni, tzimmermann

> Introduce the usage of block_h/block_w to compute the offset and the
> pointer of a pixel. The previous implementation was specialized for
> planes with block_h == block_w == 1. To avoid confusion and allow easier
> implementation of tiled formats. It also remove the usage of the
> deprecated format field `cpp`.
> 
> Introduce the plane_index parameter to get an offset/pointer on a
> different plane.
> 
> Acked-by: Maíra Canal <mairacanal@riseup.net>
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> ---
>  drivers/gpu/drm/vkms/vkms_formats.c | 114 ++++++++++++++++++++++++++++--------
>  1 file changed, 91 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 06aef5162529..7f932d42394d 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -10,22 +10,46 @@
>  #include "vkms_formats.h"
>  
>  /**
> - * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane
> + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
>   *
>   * @frame_info: Buffer metadata
>   * @x: The x coordinate of the wanted pixel in the buffer
>   * @y: The y coordinate of the wanted pixel in the buffer
> + * @plane_index: The index of the plane to use
> + * @offset: The returned offset inside the buffer of the block

The previous function (pixel_offset) returned a size_t for the offset rather
than an int. Do you know if we are safe using an int in this case?

> + * @rem_x: The returned X coordinate of the requested pixel in the block
> + * @rem_y: The returned Y coordinate of the requested pixel in the block
>   *
> - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> - * where block_h == block_w == 1.
> - * If this requirement is not fulfilled, the resulting offset can point to an other pixel or
> - * outside of the buffer.
> + * As some pixel formats store multiple pixels in a block (DRM_FORMAT_R* for example), some
> + * pixels are not individually addressable. This function return 3 values: the offset of the
> + * whole block, and the coordinate of the requested pixel inside this block.
> + * For example, if the format is DRM_FORMAT_R1 and the requested coordinate is 13,5, the offset
> + * will point to the byte 5*pitches + 13/8 (second byte of the 5th line), and the rem_x/rem_y
> + * coordinates will be (13 % 8, 5 % 1) = (5, 0)
> + *
> + * With this function, the caller just have to extract the correct pixel from the block.
>   */
> -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> +static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> +				 int plane_index, int *offset, int *rem_x, int *rem_y)
>  {
>  	struct drm_framebuffer *fb = frame_info->fb;
> +	const struct drm_format_info *format = frame_info->fb->format;
> +	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> +	 * in some formats a block can represent multiple pixels.
> +	 *
> +	 * Dividing x and y by the block size allows to extract the correct offset of the block
> +	 * containing the pixel.
> +	 */
>  
> -	return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
> +	int block_x = x / drm_format_info_block_width(format, plane_index);
> +	int block_y = y / drm_format_info_block_height(format, plane_index);
> +	int block_pitch = fb->pitches[plane_index] * drm_format_info_block_height(format,
> +										  plane_index);
> +	*rem_x = x % drm_format_info_block_width(format, plane_index);
> +	*rem_y = y % drm_format_info_block_height(format, plane_index);
> +	*offset = fb->offsets[plane_index] +
> +		  block_y * block_pitch +
> +		  block_x * format->char_per_block[plane_index];
>  }
>  
>  /**
> @@ -35,30 +59,71 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
>   * @frame_info: Buffer metadata
>   * @x: The x (width) coordinate inside the plane
>   * @y: The y (height) coordinate inside the plane
> + * @plane_index: The index of the plane
> + * @addr: The returned pointer
> + * @rem_x: The returned X coordinate of the requested pixel in the block
> + * @rem_y: The returned Y coordinate of the requested pixel in the block
>   *
> - * Takes the information stored in the frame_info, a pair of coordinates, and
> - * returns the address of the first color channel.
> - * This function assumes the channels are packed together, i.e. a color channel
> - * comes immediately after another in the memory. And therefore, this function
> - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> + * of the block containing this pixel and the pixel position inside this block.
>   *
> - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> - * where block_h == block_w == 1, otherwise the returned pointer can be outside the buffer.
> + * See @packed_pixel_offset for details about rem_x/rem_y behavior.

Missing "s" in the name of the function. Should read "@packed_pixels_offset".

>   */
> -static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> -				int x, int y)
> +static void packed_pixels_addr(const struct vkms_frame_info *frame_info,
> +			       int x, int y, int plane_index, u8 **addr, int *rem_x,
> +			       int *rem_y)
>  {
> -	size_t offset = pixel_offset(frame_info, x, y);
> +	int offset;
>  
> -	return (u8 *)frame_info->map[0].vaddr + offset;
> +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, rem_x, rem_y);
> +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
>  }
>  
> -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> +/**
> + * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given
> + * coordinates
> + *
> + * @frame_info: Buffer metadata
> + * @x: The x (width) coordinate inside the plane
> + * @y: The y (height) coordinate inside the plane
> + * @plane_index: The index of the plane
> + * @addr: The returned pointer
> + *
> + * This function can only be used with format where block_h == block_w == 1.
> + */
> +static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
> +				   int x, int y, int plane_index, u8 **addr)
> +{
> +	int offset, rem_x, rem_y;

Nitpick, but it'd be nice if packed_pixels_offset() could take NULLs in
the output values so we avoid declaring unused variables here and when
calling packed_pixels_addr().

> +
> +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format,
> +					      plane_index) != 1,
> +		"%s() only support formats with block_w == 1", __func__);
> +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format,
> +					       plane_index) != 1,
> +		"%s() only support formats with block_h == 1", __func__);
> +
> +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, &rem_x,
> +			     &rem_y);
> +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> +}
> +
> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
> +				 int plane_index)
>  {
>  	int x_src = frame_info->src.x1 >> 16;
>  	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> +	u8 *addr;
> +	int rem_x, rem_y;
> +
> +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
> +		  "%s() only support formats with block_w == 1", __func__);
> +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
> +		  "%s() only support formats with block_h == 1", __func__);
>  
> -	return packed_pixels_addr(frame_info, x_src, y_src);
> +	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
> +
> +	return addr;
>  }
>  
>  static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> @@ -152,14 +217,14 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
>  {
>  	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>  	struct vkms_frame_info *frame_info = plane->frame_info;
> -	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
>  	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
>  
>  	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
>  		int x_pos = get_x_position(frame_info, limit, x);
>  
>  		if (drm_rotation_90_or_270(frame_info->rotation))
> -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> +			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
>  				+ frame_info->fb->format->cpp[0] * y;
>  
>  		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> @@ -250,7 +315,10 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
>  {
>  	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
>  	int x_dst = frame_info->dst.x1;
> -	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	u8 *dst_pixels;
> +	int rem_x, rem_y;
> +
> +	packed_pixels_addr(frame_info, x_dst, y, 0, &dst_pixels, &rem_x, &rem_y);
>  	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>  	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
>  
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v13 7/9] drm/vkms: Introduce pixel_read_direction enum
  2024-10-31 17:53 ` [PATCH v13 7/9] drm/vkms: Introduce pixel_read_direction enum Louis Chauvet
@ 2024-11-18 17:10   ` José Expósito
  0 siblings, 0 replies; 20+ messages in thread
From: José Expósito @ 2024-11-18 17:10 UTC (permalink / raw)
  To: louis.chauvet
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen, rdunlap,
	rodrigosiqueiramelo, seanpaul, simona.vetter, simona,
	thomas.petazzoni, tzimmermann

> The pixel_read_direction enum is useful to describe the reading direction
> in a plane. It avoids using the rotation property of DRM, which not
> practical to know the direction of reading.
> This patch also introduce two helpers, one to compute the
> pixel_read_direction from the DRM rotation property, and one to compute
> the step, in byte, between two successive pixel in a specific direction.
> 
> Acked-by: Maíra Canal <mairacanal@riseup.net>
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 44 ++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_drv.h      | 11 +++++++++
>  drivers/gpu/drm/vkms/vkms_formats.c  | 32 ++++++++++++++++++++++++++
>  3 files changed, 87 insertions(+)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index ecac0bc858a0..601e33431b45 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -159,6 +159,50 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
>  	}
>  }
>  
> +/**
> + * direction_for_rotation() - Get the correct reading direction for a given rotation
> + *
> + * @rotation: Rotation to analyze. It correspond the field @frame_info.rotation.
> + *
> + * This function will use the @rotation setting of a source plane to compute the reading
> + * direction in this plane which correspond to a "left to right writing" in the CRTC.
> + * For example, if the buffer is reflected on X axis, the pixel must be read from right to left
> + * to be written from left to right on the CRTC.
> + */
> +static enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> +{
> +	struct drm_rect tmp_a, tmp_b;
> +	int x, y;
> +
> +	/*
> +	 * Points A and B are depicted as zero-size rectangles on the CRTC.
> +	 * The CRTC writing direction is from A to B. The plane reading direction
> +	 * is discovered by inverse-transforming A and B.
> +	 * The reading direction is computed by rotating the vector AB (top-left to top-right) in a
> +	 * 1x1 square.
> +	 */
> +
> +	tmp_a = DRM_RECT_INIT(0, 0, 0, 0);
> +	tmp_b = DRM_RECT_INIT(1, 0, 0, 0);
> +	drm_rect_rotate_inv(&tmp_a, 1, 1, rotation);
> +	drm_rect_rotate_inv(&tmp_b, 1, 1, rotation);
> +
> +	x = tmp_b.x1 - tmp_a.x1;
> +	y = tmp_b.y1 - tmp_a.y1;
> +
> +	if (x == 1 && y == 0)
> +		return READ_LEFT_TO_RIGHT;
> +	else if (x == -1 && y == 0)
> +		return READ_RIGHT_TO_LEFT;
> +	else if (y == 1 && x == 0)
> +		return READ_TOP_TO_BOTTOM;
> +	else if (y == -1 && x == 0)
> +		return READ_BOTTOM_TO_TOP;
> +
> +	WARN_ONCE(true, "The inverse of the rotation gives an incorrect direction.");
> +	return READ_LEFT_TO_RIGHT;
> +}
> +
>  /**
>   * blend - blend the pixels from all planes and compute crc
>   * @wb: The writeback frame buffer metadata
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 3f45290a0c5d..777b7bd91f27 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -69,6 +69,17 @@ struct vkms_writeback_job {
>  	pixel_write_t pixel_write;
>  };
>  
> +/**
> + * enum pixel_read_direction - Enum used internaly by VKMS to represent a reading direction in a

Minor typo:

s/internaly/internally

Everything else looks great, thanks!

> + * plane.
> + */
> +enum pixel_read_direction {
> +	READ_BOTTOM_TO_TOP,
> +	READ_TOP_TO_BOTTOM,
> +	READ_RIGHT_TO_LEFT,
> +	READ_LEFT_TO_RIGHT
> +};
> +
>  /**
>   * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
>   * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index 7f932d42394d..d0e7dfc1f0d3 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -79,6 +79,38 @@ static void packed_pixels_addr(const struct vkms_frame_info *frame_info,
>  	*addr = (u8 *)frame_info->map[0].vaddr + offset;
>  }
>  
> +/**
> + * get_block_step_bytes() - Common helper to compute the correct step value between each pixel block
> + * to read in a certain direction.
> + *
> + * @fb: Framebuffer to iter on
> + * @direction: Direction of the reading
> + * @plane_index: Plane to get the step from
> + *
> + * As the returned count is the number of bytes between two consecutive blocks in a direction,
> + * the caller may have to read multiple pixels before using the next one (for example, to read from
> + * left to right in a DRM_FORMAT_R1 plane, each block contains 8 pixels, so the step must be used
> + * only every 8 pixels).
> + */
> +static int get_block_step_bytes(struct drm_framebuffer *fb, enum pixel_read_direction direction,
> +				int plane_index)
> +{
> +	switch (direction) {
> +	case READ_LEFT_TO_RIGHT:
> +		return fb->format->char_per_block[plane_index];
> +	case READ_RIGHT_TO_LEFT:
> +		return -fb->format->char_per_block[plane_index];
> +	case READ_TOP_TO_BOTTOM:
> +		return (int)fb->pitches[plane_index] * drm_format_info_block_width(fb->format,
> +										   plane_index);
> +	case READ_BOTTOM_TO_TOP:
> +		return -(int)fb->pitches[plane_index] * drm_format_info_block_width(fb->format,
> +										    plane_index);
> +	}
> +
> +	return 0;
> +}
> +
>  /**
>   * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given
>   * coordinates
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v13 8/9] drm/vkms: Re-introduce line-per-line composition algorithm
  2024-10-31 17:53 ` [PATCH v13 8/9] drm/vkms: Re-introduce line-per-line composition algorithm Louis Chauvet
@ 2024-11-18 17:10   ` José Expósito
  2024-11-18 17:19     ` Louis Chauvet
  0 siblings, 1 reply; 20+ messages in thread
From: José Expósito @ 2024-11-18 17:10 UTC (permalink / raw)
  To: louis.chauvet
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen,
	pekka.paalanen, rdunlap, rodrigosiqueiramelo, seanpaul,
	simona.vetter, simona, thomas.petazzoni, tzimmermann

> Re-introduce a line-by-line composition algorithm for each pixel format.
> This allows more performance by not requiring an indirection per pixel
> read. This patch is focused on readability of the code.
> 
> Line-by-line composition was introduced by [1] but rewritten back to
> pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
> on performance, and it was merged.
> 
> This patch is almost a revert of [2], but in addition efforts have been
> made to increase readability and maintainability of the rotation handling.
> The blend function is now divided in two parts:
> - Transformation of coordinates from the output referential to the source
> referential
> - Line conversion and blending
> 
> Most of the complexity of the rotation management is avoided by using
> drm_rect_* helpers. The remaining complexity is around the clipping, to
> avoid reading/writing outside source/destination buffers.
> 
> The pixel conversion is now done line-by-line, so the read_pixel_t was
> replaced with read_pixel_line_t callback. This way the indirection is only
> required once per line and per plane, instead of once per pixel and per
> plane.
> 
> The read_line_t callbacks are very similar for most pixel format, but it
> is required to avoid performance impact. Some helpers for color
> conversion were introduced to avoid code repetition:
> - *_to_argb_u16: perform colors conversion. They should be inlined by the
>   compiler, and they are used to avoid repetition between multiple variants
>   of the same format (argb/xrgb and maybe in the future for formats like
>   bgr formats).
> 
> This new algorithm was tested with:
> - kms_plane (for color conversions)
> - kms_rotation_crc (for rotations of planes)
> - kms_cursor_crc (for translations of planes)
> - kms_rotation (for all rotations and formats combinations) [3]
> The performance gain was mesured with kms_fb_stress [4] with some
> modification to fix the writeback format.
> 
> The performance improvement is around 5 to 10%.
> 
> [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
>      new formats")
>      https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/
> [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
>      functionality")
>      https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/
> [3]: https://lore.kernel.org/igt-dev/20240313-new_rotation-v2-0-6230fd5cae59@bootlin.com/
> [4]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc88@riseup.net/
> 
> Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
> 
> # Conflicts:
> #	drivers/gpu/drm/vkms/vkms_composer.c
> 
> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>

checkpatch will complaint about this duplicated signature.

> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 234 ++++++++++++++++++++++++++++-------
>  drivers/gpu/drm/vkms/vkms_drv.h      |  28 +++--
>  drivers/gpu/drm/vkms/vkms_formats.c  | 224 ++++++++++++++++++++-------------
>  drivers/gpu/drm/vkms/vkms_formats.h  |   2 +-
>  drivers/gpu/drm/vkms/vkms_plane.c    |   5 +-
>  5 files changed, 344 insertions(+), 149 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 601e33431b45..7a3e47b895a7 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -29,8 +29,8 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>   * @x_start: The start offset
>   * @pixel_count: The number of pixels to blend
>   *
> - * The pixels [0;@pixel_count) in stage_buffer are blended at [@x_start;@x_start+@pixel_count) in
> - * output_buffer.
> + * The pixels [@x_start;@x_start+@pixel_count) in stage_buffer are blended at
> + * [@x_start;@x_start+@pixel_count) in output_buffer.
>   *
>   * The current DRM assumption is that pixel color values have been already
>   * pre-multiplied with the alpha channel values. See more
> @@ -41,7 +41,7 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer,
>  				struct line_buffer *output_buffer, int x_start, int pixel_count)
>  {
>  	struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> -	const struct pixel_argb_u16 *in = stage_buffer->pixels;
> +	const struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
>  
>  	for (int i = 0; i < pixel_count; i++) {
>  		out[i].a = (u16)0xffff;
> @@ -51,33 +51,6 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer,
>  	}
>  }
>  
> -static int get_y_pos(struct vkms_frame_info *frame_info, int y)
> -{
> -	if (frame_info->rotation & DRM_MODE_REFLECT_Y)
> -		return drm_rect_height(&frame_info->rotated) - y - 1;
> -
> -	switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
> -	case DRM_MODE_ROTATE_90:
> -		return frame_info->rotated.x2 - y - 1;
> -	case DRM_MODE_ROTATE_270:
> -		return y + frame_info->rotated.x1;
> -	default:
> -		return y;
> -	}
> -}
> -
> -static bool check_limit(struct vkms_frame_info *frame_info, int pos)
> -{
> -	if (drm_rotation_90_or_270(frame_info->rotation)) {
> -		if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated))
> -			return true;
> -	} else {
> -		if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2)
> -			return true;
> -	}
> -
> -	return false;
> -}
>  
>  static void fill_background(const struct pixel_argb_u16 *background_color,
>  			    struct line_buffer *output_buffer)
> @@ -203,6 +176,182 @@ static enum pixel_read_direction direction_for_rotation(unsigned int rotation)
>  	return READ_LEFT_TO_RIGHT;
>  }
>  
> +/**
> + * clamp_line_coordinates() - Compute and clamp the coordinate to read and write during the blend
> + * process.
> + *
> + * @direction: direction of the reading
> + * @current_plane: current plane blended
> + * @src_line: source line of the reading. Only the top-left coordinate is used. This rectangle
> + * must be rotated and have a shape of 1*pixel_count if @direction is vertical and a shape of
> + * pixel_count*1 if @direction is horizontal.
> + * @src_x_start: x start coordinate for the line reading
> + * @src_y_start: y start coordinate for the line reading
> + * @dst_x_start: x coordinate to blend the read line
> + * @pixel_count: number of pixels to blend
> + *
> + * This function is mainly a safety net to avoid reading outside the source buffer. As the
> + * userspace should never ask to read outside the source plane, all the cases covered here should
> + * be dead code.
> + */
> +static void clamp_line_coordinates(enum pixel_read_direction direction,
> +				   const struct vkms_plane_state *current_plane,
> +				   const struct drm_rect *src_line, int *src_x_start,
> +				   int *src_y_start, int *dst_x_start, int *pixel_count)
> +{
> +	/* By default the start points are correct */
> +	*src_x_start = src_line->x1;
> +	*src_y_start = src_line->y1;
> +	*dst_x_start = current_plane->frame_info->dst.x1;
> +
> +	/* Get the correct number of pixel to blend, it depends of the direction */
> +	switch (direction) {
> +	case READ_LEFT_TO_RIGHT:
> +	case READ_RIGHT_TO_LEFT:
> +		*pixel_count = drm_rect_width(src_line);
> +		break;
> +	case READ_BOTTOM_TO_TOP:
> +	case READ_TOP_TO_BOTTOM:
> +		*pixel_count = drm_rect_height(src_line);
> +		break;
> +	}
> +
> +	/*
> +	 * Clamp the coordinates to avoid reading outside the buffer
> +	 *
> +	 * This is mainly a security check to avoid reading outside the buffer, the userspace
> +	 * should never request to read outside the source buffer.
> +	 */
> +	switch (direction) {
> +	case READ_LEFT_TO_RIGHT:
> +	case READ_RIGHT_TO_LEFT:
> +		if (*src_x_start < 0) {
> +			*pixel_count += *src_x_start;
> +			*dst_x_start -= *src_x_start;
> +			*src_x_start = 0;
> +		}
> +		if (*src_x_start + *pixel_count > current_plane->frame_info->fb->width)
> +			*pixel_count = max(0, (int)current_plane->frame_info->fb->width -
> +				*src_x_start);
> +		break;
> +	case READ_BOTTOM_TO_TOP:
> +	case READ_TOP_TO_BOTTOM:
> +		if (*src_y_start < 0) {
> +			*pixel_count += *src_y_start;
> +			*dst_x_start -= *src_y_start;
> +			*src_y_start = 0;
> +		}
> +		if (*src_y_start + *pixel_count > current_plane->frame_info->fb->height)
> +			*pixel_count = max(0, (int)current_plane->frame_info->fb->height -
> +				*src_y_start);
> +		break;
> +	}
> +}
> +
> +/**
> + * blend_line() - Blend a line from a plane to the output buffer
> + *
> + * @current_plane: current plane to work on
> + * @y: line to write in the output buffer
> + * @crtc_x_limit: width of the output buffer
> + * @stage_buffer: temporary buffer to convert the pixel line from the source buffer
> + * @output_buffer: buffer to blend the read line into.
> + */
> +static void blend_line(struct vkms_plane_state *current_plane, int y,
> +		       int crtc_x_limit, struct line_buffer *stage_buffer,
> +		       struct line_buffer *output_buffer)
> +{
> +	int src_x_start, src_y_start, dst_x_start, pixel_count;
> +	struct drm_rect dst_line, tmp_src, src_line;
> +
> +	/* Avoid rendering useless lines */
> +	if (y < current_plane->frame_info->dst.y1 ||
> +	    y >= current_plane->frame_info->dst.y2)
> +		return;
> +
> +	/*
> +	 * dst_line is the line to copy. The initial coordinates are inside the
> +	 * destination framebuffer, and then drm_rect_* helpers are used to
> +	 * compute the correct position into the source framebuffer.
> +	 */
> +	dst_line = DRM_RECT_INIT(current_plane->frame_info->dst.x1, y,
> +				 drm_rect_width(&current_plane->frame_info->dst),
> +				 1);
> +
> +	drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
> +
> +	/*
> +	 * [1]: Clamping src_line to the crtc_x_limit to avoid writing outside of
> +	 * the destination buffer
> +	 */
> +	dst_line.x1 = max_t(int, dst_line.x1, 0);
> +	dst_line.x2 = min_t(int, dst_line.x2, crtc_x_limit);
> +	/* The destination is completely outside of the crtc. */
> +	if (dst_line.x2 <= dst_line.x1)
> +		return;
> +
> +	src_line = dst_line;
> +
> +	/*
> +	 * Transform the coordinate x/y from the crtc to coordinates into
> +	 * coordinates for the src buffer.
> +	 *
> +	 * - Cancel the offset of the dst buffer.
> +	 * - Invert the rotation. This assumes that
> +	 *   dst = drm_rect_rotate(src, rotation) (dst and src have the
> +	 *   same size, but can be rotated).
> +	 * - Apply the offset of the source rectangle to the coordinate.
> +	 */
> +	drm_rect_translate(&src_line, -current_plane->frame_info->dst.x1,
> +			   -current_plane->frame_info->dst.y1);
> +	drm_rect_rotate_inv(&src_line, drm_rect_width(&tmp_src),
> +			    drm_rect_height(&tmp_src),
> +			    current_plane->frame_info->rotation);
> +	drm_rect_translate(&src_line, tmp_src.x1, tmp_src.y1);
> +
> +	/* Get the correct reading direction in the source buffer. */
> +
> +	enum pixel_read_direction direction =
> +		direction_for_rotation(current_plane->frame_info->rotation);
> +
> +	/* [2]: Compute and clamp the number of pixel to read */
> +	clamp_line_coordinates(direction, current_plane, &src_line, &src_x_start, &src_y_start,
> +			       &dst_x_start, &pixel_count);
> +
> +	if (pixel_count <= 0) {
> +		/* Nothing to read, so avoid multiple function calls */
> +		return;
> +	}
> +
> +	/*
> +	 * Modify the starting point to take in account the rotation
> +	 *
> +	 * src_line is the top-left corner, so when reading READ_RIGHT_TO_LEFT or
> +	 * READ_BOTTOM_TO_TOP, it must be changed to the top-right/bottom-left
> +	 * corner.
> +	 */
> +	if (direction == READ_RIGHT_TO_LEFT) {
> +		// src_x_start is now the right point
> +		src_x_start += pixel_count - 1;
> +	} else if (direction == READ_BOTTOM_TO_TOP) {
> +		// src_y_start is now the bottom point
> +		src_y_start += pixel_count - 1;
> +	}
> +
> +	/*
> +	 * Perform the conversion and the blending
> +	 *
> +	 * Here we know that the read line (x_start, y_start, pixel_count) is
> +	 * inside the source buffer [2] and we don't write outside the stage
> +	 * buffer [1].
> +	 */
> +	current_plane->pixel_read_line(current_plane, src_x_start, src_y_start, direction,
> +				       pixel_count, &stage_buffer->pixels[dst_x_start]);
> +
> +	pre_mul_alpha_blend(stage_buffer, output_buffer,
> +			    dst_x_start, pixel_count);
> +}
> +
>  /**
>   * blend - blend the pixels from all planes and compute crc
>   * @wb: The writeback frame buffer metadata
> @@ -223,34 +372,25 @@ static void blend(struct vkms_writeback_job *wb,
>  {
>  	struct vkms_plane_state **plane = crtc_state->active_planes;
>  	u32 n_active_planes = crtc_state->num_active_planes;
> -	int y_pos, x_dst, pixel_count;
>  
>  	const struct pixel_argb_u16 background_color = { .a = 0xffff };
>  
> -	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> +	int crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> +	int crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;

I'm sure you are already aware of the tiny conflict with:
https://lore.kernel.org/all/20241003-remove-legacy-v1-1-0b7db1f1a1a6@bootlin.com/

This is now:
-	size_t crtc_y_limit = crtc_state->base.mode.vdisplay;
+	int crtc_y_limit = crtc_state->base.mode.vdisplay;
+	int crtc_x_limit = crtc_state->base.mode.hdisplay;

>  
>  	/*
>  	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
>  	 * complexity to avoid poor blending performance.
>  	 *
> -	 * The function vkms_compose_row() is used to read a line, pixel-by-pixel, into the staging
> -	 * buffer.
> +	 * The function pixel_read_line callback is used to read a line, using an efficient
> +	 * algorithm for a specific format, into the staging buffer.
>  	 */
> -	for (size_t y = 0; y < crtc_y_limit; y++) {
> +	for (int y = 0; y < crtc_y_limit; y++) {
>  		fill_background(&background_color, output_buffer);
>  
>  		/* The active planes are composed associatively in z-order. */
>  		for (size_t i = 0; i < n_active_planes; i++) {
> -			x_dst = plane[i]->frame_info->dst.x1;
> -			pixel_count = min_t(int, drm_rect_width(&plane[i]->frame_info->dst),
> -					    (int)stage_buffer->n_pixels);
> -			y_pos = get_y_pos(plane[i]->frame_info, y);
> -
> -			if (!check_limit(plane[i]->frame_info, y_pos))
> -				continue;
> -
> -			vkms_compose_row(stage_buffer, plane[i], y_pos);
> -			pre_mul_alpha_blend(stage_buffer, output_buffer, x_dst, pixel_count);
> +			blend_line(plane[i], y, crtc_x_limit, stage_buffer, output_buffer);
>  		}
>  
>  		apply_lut(crtc_state, output_buffer);
> @@ -258,7 +398,7 @@ static void blend(struct vkms_writeback_job *wb,
>  		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
>  
>  		if (wb)
> -			vkms_writeback_row(wb, output_buffer, y_pos);
> +			vkms_writeback_row(wb, output_buffer, y);
>  	}
>  }
>  
> @@ -269,7 +409,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state,
>  	u32 n_active_planes = crtc_state->num_active_planes;
>  
>  	for (size_t i = 0; i < n_active_planes; i++)
> -		if (!planes[i]->pixel_read)
> +		if (!planes[i]->pixel_read_line)
>  			return -1;
>  
>  	if (active_wb && !active_wb->pixel_write)
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 777b7bd91f27..067a4797f7a0 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -39,7 +39,6 @@
>  struct vkms_frame_info {
>  	struct drm_framebuffer *fb;
>  	struct drm_rect src, dst;
> -	struct drm_rect rotated;
>  	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
>  	unsigned int rotation;
>  };
> @@ -80,26 +79,38 @@ enum pixel_read_direction {
>  	READ_LEFT_TO_RIGHT
>  };
>  
> +struct vkms_plane_state;
> +
>  /**
> - * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
> + * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
>   * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
>   *
> - * @in_pixel: pointer to the pixel to read
> - * @out_pixel: pointer to write the converted pixel
> + * @plane: plane used as source for the pixel value
> + * @x_start: X (width) coordinate of the first pixel to copy. The caller must ensure that x_start
> + * is non-negative and smaller than @plane->frame_info->fb->width.
> + * @y_start: Y (height) coordinate of the first pixel to copy. The caller must ensure that y_start
> + * is non-negative and smaller than @plane->frame_info->fb->height.
> + * @direction: direction to use for the copy, starting at @x_start/@y_start
> + * @count: number of pixels to copy
> + * @out_pixel: pointer where to write the pixel values. They will be written from @out_pixel[0]
> + * (included) to @out_pixel[@count] (excluded). The caller must ensure that out_pixel have a
> + * length of at least @count.
>   */
> -typedef void (*pixel_read_t)(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
> +typedef void (*pixel_read_line_t)(const struct vkms_plane_state *plane, int x_start,
> +				  int y_start, enum pixel_read_direction direction, int count,
> +				  struct pixel_argb_u16 out_pixel[]);
>  
>  /**
>   * struct vkms_plane_state - Driver specific plane state
>   * @base: base plane state
>   * @frame_info: data required for composing computation
> - * @pixel_read: function to read a pixel in this plane. The creator of a struct vkms_plane_state
> - *	        must ensure that this pointer is valid
> + * @pixel_read_line: function to read a pixel line in this plane. The creator of a
> + *		     struct vkms_plane_state must ensure that this pointer is valid
>   */
>  struct vkms_plane_state {
>  	struct drm_shadow_plane_state base;
>  	struct vkms_frame_info *frame_info;
> -	pixel_read_t pixel_read;
> +	pixel_read_line_t pixel_read_line;
>  };
>  
>  struct vkms_plane {
> @@ -265,7 +276,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
>  /* Composer Support */
>  void vkms_composer_worker(struct work_struct *work);
>  void vkms_set_composer(struct vkms_output *out, bool enabled);
> -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
>  void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);
>  
>  /* Writeback */
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> index d0e7dfc1f0d3..0f6678420a11 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.c
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -140,83 +140,51 @@ static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
>  	*addr = (u8 *)frame_info->map[0].vaddr + offset;
>  }
>  
> -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
> -				 int plane_index)
> -{
> -	int x_src = frame_info->src.x1 >> 16;
> -	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> -	u8 *addr;
> -	int rem_x, rem_y;
> -
> -	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
> -		  "%s() only support formats with block_w == 1", __func__);
> -	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
> -		  "%s() only support formats with block_h == 1", __func__);
> -
> -	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
> -
> -	return addr;
> -}
> -
> -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> -{
> -	if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> -		return limit - x - 1;
> -	return x;
> -}
> -
>  /*
> - * The following functions take pixel data from the buffer and convert them to the format
> - * ARGB16161616 in @out_pixel.
> + * The following functions take pixel data (a, r, g, b, pixel, ...) and convert them to
> + * &struct pixel_argb_u16
>   *
> - * They are used in the vkms_compose_row() function to handle multiple formats.
> + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
>   */
>  
> -static void ARGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> +static struct pixel_argb_u16 argb_u16_from_u8888(u8 a, u8 r, u8 g, u8 b)
>  {
> +	struct pixel_argb_u16 out_pixel;
>  	/*
>  	 * The 257 is the "conversion ratio". This number is obtained by the
>  	 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>  	 * the best color value in a pixel format with more possibilities.
>  	 * A similar idea applies to others RGB color conversions.
>  	 */
> -	out_pixel->a = (u16)in_pixel[3] * 257;
> -	out_pixel->r = (u16)in_pixel[2] * 257;
> -	out_pixel->g = (u16)in_pixel[1] * 257;
> -	out_pixel->b = (u16)in_pixel[0] * 257;
> -}
> +	out_pixel.a = (u16)a * 257;
> +	out_pixel.r = (u16)r * 257;
> +	out_pixel.g = (u16)g * 257;
> +	out_pixel.b = (u16)b * 257;
>  
> -static void XRGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> -{
> -	out_pixel->a = (u16)0xffff;
> -	out_pixel->r = (u16)in_pixel[2] * 257;
> -	out_pixel->g = (u16)in_pixel[1] * 257;
> -	out_pixel->b = (u16)in_pixel[0] * 257;
> +	return out_pixel;
>  }
>  
> -static void ARGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> +static struct pixel_argb_u16 argb_u16_from_u16161616(u16 a, u16 r, u16 g, u16 b)
>  {
> -	__le16 *pixel = (__le16 *)in_pixel;
> +	struct pixel_argb_u16 out_pixel;
> +
> +	out_pixel.a = a;
> +	out_pixel.r = r;
> +	out_pixel.g = g;
> +	out_pixel.b = b;
>  
> -	out_pixel->a = le16_to_cpu(pixel[3]);
> -	out_pixel->r = le16_to_cpu(pixel[2]);
> -	out_pixel->g = le16_to_cpu(pixel[1]);
> -	out_pixel->b = le16_to_cpu(pixel[0]);
> +	return out_pixel;
>  }
>  
> -static void XRGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> +static struct pixel_argb_u16 argb_u16_from_le16161616(__le16 a, __le16 r, __le16 g, __le16 b)
>  {
> -	__le16 *pixel = (__le16 *)in_pixel;
> -
> -	out_pixel->a = (u16)0xffff;
> -	out_pixel->r = le16_to_cpu(pixel[2]);
> -	out_pixel->g = le16_to_cpu(pixel[1]);
> -	out_pixel->b = le16_to_cpu(pixel[0]);
> +	return argb_u16_from_u16161616(le16_to_cpu(a), le16_to_cpu(r), le16_to_cpu(g),
> +				       le16_to_cpu(b));
>  }
>  
> -static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> +static struct pixel_argb_u16 argb_u16_from_RGB565(const __le16 *pixel)
>  {
> -	__le16 *pixel = (__le16 *)in_pixel;
> +	struct pixel_argb_u16 out_pixel;
>  
>  	s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
>  	s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
> @@ -226,40 +194,120 @@ static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pi
>  	s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
>  	s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
>  
> -	out_pixel->a = (u16)0xffff;
> -	out_pixel->r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio));
> -	out_pixel->g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio));
> -	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> +	out_pixel.a = (u16)0xffff;
> +	out_pixel.r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio));
> +	out_pixel.g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio));
> +	out_pixel.b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> +
> +	return out_pixel;
>  }
>  
> -/**
> - * vkms_compose_row - compose a single row of a plane
> - * @stage_buffer: output line with the composed pixels
> - * @plane: state of the plane that is being composed
> - * @y: y coordinate of the row
> +/*
> + * The following functions are read_line function for each pixel format supported by VKMS.
> + *
> + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> + * is stored in @out_pixel and in the format ARGB16161616.
> + *
> + * These functions are very repetitive, but the innermost pixel loops must be kept inside these
> + * functions for performance reasons. Some benchmarking was done in [1] where having the innermost
> + * loop factored out of these functions showed a slowdown by a factor of three.
>   *
> - * This function composes a single row of a plane. It gets the source pixels
> - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> - * through the source pixel, reading the pixels and converting it to
> - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> - * the source pixels are not traversed linearly. The source pixels are queried
> - * on each iteration in order to traverse the pixels vertically.
> + * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
>   */
> -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> +
> +static void ARGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start,
> +			       enum pixel_read_direction direction, int count,
> +			       struct pixel_argb_u16 out_pixel[])
>  {
> -	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> -	struct vkms_frame_info *frame_info = plane->frame_info;
> -	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
> -	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> +	struct pixel_argb_u16 *end = out_pixel + count;
> +	u8 *src_pixels;
> +
> +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> +
> +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> +
> +	while (out_pixel < end) {
> +		u8 *px = (u8 *)src_pixels;
> +		*out_pixel = argb_u16_from_u8888(px[3], px[2], px[1], px[0]);
> +		out_pixel += 1;
> +		src_pixels += step;
> +	}
> +}
> +
> +static void XRGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start,
> +			       enum pixel_read_direction direction, int count,
> +			       struct pixel_argb_u16 out_pixel[])
> +{
> +	struct pixel_argb_u16 *end = out_pixel + count;
> +	u8 *src_pixels;
> +
> +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> +
> +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> +
> +	while (out_pixel < end) {
> +		u8 *px = (u8 *)src_pixels;
> +		*out_pixel = argb_u16_from_u8888(255, px[2], px[1], px[0]);
> +		out_pixel += 1;
> +		src_pixels += step;
> +	}
> +}
> +
> +static void ARGB16161616_read_line(const struct vkms_plane_state *plane, int x_start,
> +				   int y_start, enum pixel_read_direction direction, int count,
> +				   struct pixel_argb_u16 out_pixel[])
> +{
> +	struct pixel_argb_u16 *end = out_pixel + count;
> +	u8 *src_pixels;
> +
> +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> +
> +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> +
> +	while (out_pixel < end) {
> +		u16 *px = (u16 *)src_pixels;
> +		*out_pixel = argb_u16_from_u16161616(px[3], px[2], px[1], px[0]);
> +		out_pixel += 1;
> +		src_pixels += step;
> +	}
> +}
> +
> +static void XRGB16161616_read_line(const struct vkms_plane_state *plane, int x_start,
> +				   int y_start, enum pixel_read_direction direction, int count,
> +				   struct pixel_argb_u16 out_pixel[])
> +{
> +	struct pixel_argb_u16 *end = out_pixel + count;
> +	u8 *src_pixels;
> +
> +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> +
> +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> +
> +	while (out_pixel < end) {
> +		__le16 *px = (__le16 *)src_pixels;
> +		*out_pixel = argb_u16_from_le16161616(cpu_to_le16(0xFFFF), px[2], px[1], px[0]);
> +		out_pixel += 1;
> +		src_pixels += step;
> +	}
> +}
> +
> +static void RGB565_read_line(const struct vkms_plane_state *plane, int x_start,
> +			     int y_start, enum pixel_read_direction direction, int count,
> +			     struct pixel_argb_u16 out_pixel[])
> +{
> +	struct pixel_argb_u16 *end = out_pixel + count;
> +	u8 *src_pixels;
> +
> +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
>  
> -	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> -		int x_pos = get_x_position(frame_info, limit, x);
> +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
>  
> -		if (drm_rotation_90_or_270(frame_info->rotation))
> -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
> -				+ frame_info->fb->format->cpp[0] * y;
> +	while (out_pixel < end) {
> +		__le16 *px = (__le16 *)src_pixels;
>  
> -		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> +		*out_pixel = argb_u16_from_RGB565(px);
> +		out_pixel += 1;
> +		src_pixels += step;
>  	}
>  }
>  
> @@ -359,25 +407,25 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
>  }
>  
>  /**
> - * get_pixel_read_function() - Retrieve the correct read_pixel function for a specific
> + * get_pixel_read_line_function() - Retrieve the correct read_line function for a specific
>   * format. The returned pointer is NULL for unsupported pixel formats. The caller must ensure that
>   * the pointer is valid before using it in a vkms_plane_state.
>   *
>   * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h])
>   */
> -pixel_read_t get_pixel_read_function(u32 format)
> +pixel_read_line_t get_pixel_read_line_function(u32 format)
>  {
>  	switch (format) {
>  	case DRM_FORMAT_ARGB8888:
> -		return &ARGB8888_to_argb_u16;
> +		return &ARGB8888_read_line;
>  	case DRM_FORMAT_XRGB8888:
> -		return &XRGB8888_to_argb_u16;
> +		return &XRGB8888_read_line;
>  	case DRM_FORMAT_ARGB16161616:
> -		return &ARGB16161616_to_argb_u16;
> +		return &ARGB16161616_read_line;
>  	case DRM_FORMAT_XRGB16161616:
> -		return &XRGB16161616_to_argb_u16;
> +		return &XRGB16161616_read_line;
>  	case DRM_FORMAT_RGB565:
> -		return &RGB565_to_argb_u16;
> +		return &RGB565_read_line;
>  	default:
>  		/*
>  		 * This is a bug in vkms_plane_atomic_check(). All the supported
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> index 3ecea4563254..8d2bef95ff79 100644
> --- a/drivers/gpu/drm/vkms/vkms_formats.h
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -5,7 +5,7 @@
>  
>  #include "vkms_drv.h"
>  
> -pixel_read_t get_pixel_read_function(u32 format);
> +pixel_read_line_t get_pixel_read_line_function(u32 format);
>  
>  pixel_write_t get_pixel_write_function(u32 format);
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 10e9b23dab28..8875bed76410 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -112,7 +112,6 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	frame_info = vkms_plane_state->frame_info;
>  	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
>  	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
> -	memcpy(&frame_info->rotated, &new_state->dst, sizeof(struct drm_rect));
>  	frame_info->fb = fb;
>  	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
>  	drm_framebuffer_get(frame_info->fb);
> @@ -122,10 +121,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  									  DRM_MODE_REFLECT_X |
>  									  DRM_MODE_REFLECT_Y);
>  
> -	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
> -			drm_rect_height(&frame_info->rotated), frame_info->rotation);
>  
> -	vkms_plane_state->pixel_read = get_pixel_read_function(fmt);
> +	vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt);
>  }
>  
>  static int vkms_plane_atomic_check(struct drm_plane *plane,
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading
  2024-11-18 17:10 ` [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading José Expósito
@ 2024-11-18 17:13   ` Louis Chauvet
  0 siblings, 0 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-11-18 17:13 UTC (permalink / raw)
  To: José Expósito
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, mcanal, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen,
	pekka.paalanen, rdunlap, rodrigosiqueiramelo, seanpaul,
	simona.vetter, simona, thomas.petazzoni, tzimmermann

On 18/11/24 - 18:10, José Expósito wrote:
> Hi Louis,
> 
> > This patchset is the second version of [1]. It is almost a complete
> > rewrite to use a line-by-line algorithm for the composition.
> > 
> > It can be divided in multiple parts:
> > - PATCH 1 to 3: no functional change is intended, only some formatting and
> >   documenting (PATCH 2 is taken from [2])
> > - PATCH 4 to 7: Some preparation work not directly related to the
> >   line-by-line algorithm
> > - PATCH 8: main patch for this series, it reintroduce the
> >   line-by-line algorithm
> > - PATCH 9: Remove useless drm_simplify_rotation
> > - Rest of the series: moved to a new series to merge this one, see the new 
> >   series "Add YUV ad R1..8 formats support to VKMS"
> > 
> > The PATCH 8 aims to restore the line-by-line pixel reading algorithm. It
> > was introduced in 8ba1648567e2 ("drm: vkms: Refactor the plane composer to
> > accept new formats") but removed in 8ba1648567e2 ("drm: vkms: Refactor the
> > plane composer to accept new formats") in a over-simplification effort.
> > At this time, nobody noticed the performance impact of this commit. After
> > the first iteration of my series, poeple notice performance impact, and it
> > was the case. Pekka suggested to reimplement the line-by-line algorithm.
> > 
> > Expiriments on my side shown great improvement for the line-by-line
> > algorithm, and the performances are the same as the original line-by-line
> > algorithm. I targeted my effort to make the code working for all the
> > rotations and translations. The usage of helpers from drm_rect_* avoid
> > reimplementing existing logic.
> > 
> > The only "complex" part remaining is the clipping of the coordinate to
> > avoid reading/writing outside of src/dst. Thus I added a lot of comments
> > to help when someone will want to add some features (framebuffer resizing
> > for example).
> > 
> > I did not changed any expected test results as VKMS seems to have some 
> > existing issue:
> > https://gitlab.freedesktop.org/jim.cromie/kernel-drm-next-dd/-/jobs/61484201
> > https://gitlab.freedesktop.org/jim.cromie/kernel-drm-next-dd/-/jobs/61803193
> > https://gitlab.freedesktop.org/louischauvet/kernel/-/jobs/65944002
> > 
> > To properly test the rotation algorithm, I had to implement a new IGT
> > test [8]. This helped to found one issue in the YUV rotation algortihm.
> > 
> > My series was mainly tested with:
> > - kms_plane (for color conversions)
> > - kms_rotation_crc (for a subset of rotation and formats)
> > - kms_rotation (to test all rotation and formats combinations) [8]
> > - kms_cursor_crc (for translations)
> > The benchmark used to measure the improvment was done with
> > kms_fb_stress [10] with some modifications:
> > - Fixing the writeback format to XRGB8888
> > - Using a primary plane with odd dimension to avoid failures due to YUV
> >   alignment
> > The KMS structure was:
> > 	CRTC:
> > 		rectangle: 4096x2160+0+0
> > 	primary:
> > 		format: ABGR16161616
> > 		rectangle: 3640x2160+101+0
> > 	writeback:
> > 		format: XRGB8888
> > 		rectangle: 4096x2160+0+0
> > Results (on my computer):
> > 
> > 8356b9790650: drm/test: Add test cases for drm_rect_rotate_inv() (before any regression)
> > 322d716a3e8a: drm/vkms: isolate pixel conversion functionality (first regression)
> > cc4fd2934d41: drm/vkms: Isolate writeback pixel conversion functions (second regression)
> > 2c3d1bd284c5: drm/panel: simple: Add Microtips Technology MF-103HIEB0GA0 panel (current drm-misc-next)
> > 
> >  Used format  | This series | 2c3d1bd284c5 | cc4fd2934d41 | 322d716a3e8a | 8356b9790650 |
> > --------------+-------------+--------------+--------------+--------------+--------------+
> >  XRGB8888     |  13.261666s |   14.289582s |   10.731272s |    9.480001s |    9.277507s |
> >  XRGB16161616 |  13.282479s |   13.918926s |   10.712616s |    9.776903s |    9.291766s |
> >  RGB565       | 136.154163s |  141.646489s |  101.744050s |  103.712164s |   87.860923s |
> > 
> > This is a 5-10% improvment of the performance. More work need to be done
> > on the writeback to gain more.
> > 
> > [1]: https://lore.kernel.org/all/20240201-yuv-v1-0-3ca376f27632@bootlin.com
> > [2]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-0-952fcaa5a193@riseup.net/
> > [3]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-3-952fcaa5a193@riseup.net/
> > [4]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-5-952fcaa5a193@riseup.net/
> > [5]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-6-952fcaa5a193@riseup.net/
> > [6]: https://lore.kernel.org/all/20240110-vkms-yuv-v2-7-952fcaa5a193@riseup.net/
> > [8]: https://lore.kernel.org/r/20240313-new_rotation-v2-0-6230fd5cae59@bootlin.com
> > [9]: https://lore.kernel.org/dri-devel/20240306-louis-vkms-conv-v1-1-5bfe7d129fdd@riseup.net/
> > [10]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc88@riseup.net/
> > 
> > To: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
> > To: Melissa Wen <melissa.srw@gmail.com>
> > To: Maíra Canal <mairacanal@riseup.net>
> > To: Haneen Mohammed <hamohammed.sa@gmail.com>
> > To: Daniel Vetter <daniel@ffwll.ch>
> > To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > To: Maxime Ripard <mripard@kernel.org>
> > To: Thomas Zimmermann <tzimmermann@suse.de>
> > To: David Airlie <airlied@gmail.com>
> > To: rdunlap@infradead.org
> > To: arthurgrillo@riseup.net
> > To: Jonathan Corbet <corbet@lwn.net>
> > To: pekka.paalanen@haloniitty.fi
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: jeremie.dautheribes@bootlin.com
> > Cc: miquel.raynal@bootlin.com
> > Cc: thomas.petazzoni@bootlin.com
> > Cc: seanpaul@google.com
> > Cc: marcheu@google.com
> > Cc: nicolejadeyee@google.com
> > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> 
> Thanks for working on this. I reviewed the series and, with the exception of
> a couple of *very* minor comments, it looks good to me.

I already have to send a v14 so don't worry, I sent the v13 before 
commiting on vkms...

> Feel free to add to the entire series:
> Reviewed-by: José Expósito <jose.exposito89@gmail.com>

Thanks a lot! I will add it.
 
> > Changes in v13:
> > - Removed the YUV part to prepare the merge
> > - Add Acked-by from Maíra
> > - Link to v12: https://lore.kernel.org/r/20241007-yuv-v12-0-01c1ada6fec8@bootlin.com
> > Changes in v12:
> > - Fix documentation issues as suggested by Randy
> > - Link to v11: https://lore.kernel.org/r/20240930-yuv-v11-0-4b1a26bcfc96@bootlin.com
> > Changes in v11:
> > - Remove documentation patch (already merged)
> > - Fix sparse warning about documentation
> > - Link to v10: https://lore.kernel.org/r/20240809-yuv-v10-0-1a7c764166f7@bootlin.com
> > Changes in v10:
> > - Properly remove the patch introducing dummy read/write functions
> > - PATCH 8/16: Format fixups
> > - PATCH 9/16: Format fixups
> > - PATCH 11/16: Format fixups
> > - PATCH 14/16: Fix test compilation, add module description
> > - Link to v9: https://lore.kernel.org/r/20240802-yuv-v9-0-08a706669e16@bootlin.com
> > Changes in v9:
> > - PATCH 3/17: Fix docs as Maíra suggested
> > - PATCH 4,6,10,12,15,17/17: Fix sparse warning about __le16 casting
> > - Link to v8: https://lore.kernel.org/all/20240516-yuv-v8-0-cf8d6f86430e@bootlin.com/
> > Changes in v8:
> > - PATCH 7/17: Update pitch access to use the proper value for block
> >   formats
> > - PATCH 9/17: Update pitch access to use the proper value for block
> >   formats
> > - Link to v7: https://lore.kernel.org/r/20240513-yuv-v7-0-380e9ffec502@bootlin.com
> > Changes in v7:
> > - Some typos and indent fixes
> > - Add Review-By, Acked-By
> > - PATCH 3/17: Clarify src/dst unit
> > - PATCH 9/17: Clarify documentation
> > - PATCH 9/17: Restrict conditions for direction
> > - PATCH 9/17: Rename get_block_step_byte to get_block_step_bytes
> > - PATCH 10/17: Clarify kernel doc for clamp_line_coordinates, blend_line,
> >   pixel_read_line_t
> > - PATCH 10/17: Fix the case when src_*_start >= fb->width/height
> > - PATCH 10/17: Change y in blend to be an int
> > - PATCH 10/17: Clarify documentation for read functions
> > - PATCH 12/17: Fix the type of rgb variables in argb_u16_from_yuv888
> > - PATCH 12/17: Move comments at the right place, remove useless ones
> > - PATCH 12/17: Add missing const
> > - PATCH 17/17: Use drm_format_info_bpp and computation to avoid hard-coded
> >   values
> > - Link to v6: https://lore.kernel.org/r/20240409-yuv-v6-0-de1c5728fd70@bootlin.com
> > Changes in v6:
> > - Add Randy
> > - Add Review-By and Acked-By
> > - PATCH 2/17: Remove useless newline
> > - PATCH 3/17: Fix kernel doc
> > - PATCH 4/17: Fix typo in git commit
> > - PATCH 4/17: Fix kernel doc and simplify brief description of typedef
> > - PATCH 5/17: Change black default color to Magenta
> > - PATCH 5/17: Fix wording in comment
> > - PATCH 7/17: Fix typo in packed_pixel_offset
> > - PATCH 7/17: Add WARN_ON for currently not supported formats
> > - PATCH 8/17: Rename x_limit to pixel_count
> > - PATCH 8/17: Clarify kernel doc for pre_mul_alpha_blend
> > - PATCH 9/17: Rename get_step_next_block to get_block_step_bytes
> > - PATCH 9/17: Change kernel doc order
> > - PATCH 9/17: Rework the direction_for_rotation function to use drm
> >   helpers
> > - PATCH 9/17: Add a warn in direction_for_rotation if the result is not
> >   expected
> > - PATCH 10/17: Reword the comment of pixel color conversion functions
> > - PATCH 10/17: Refactor the blending function to extract functions
> > - PATCH 11/17: Remove useless drm_rotation_simplify
> > - PATCH 12/17: Fix typo in comments
> > - PATCH 12/17: Remove useless define
> > - PATCH 12/17: Fix some comments typo and kernel doc
> > - PATCH 12/17: Add a comma at the end of the vkms_formats list
> > - PATCH 12/17: Use copy of matrix instead of pointers
> > - PATCH 12/17: Use 16 bit range for yuv conversion
> > - PATCH 17/17: Add a comma at the end of the vkms_formats list
> > - PATCH 17/17: Add assertions
> > - PATCH 17/17: Fix color conversion... Next time I will read the doc
> >   twice...
> > - Link to v5: https://lore.kernel.org/r/20240313-yuv-v5-0-e610cbd03f52@bootlin.com
> > Changes in v5:
> > - All patches: fix some formatting issues
> > - PATCH 4/16: Use the correct formatter for 4cc code
> > - PATCH 7/16: Update the pixel accessors to also return the pixel position
> >   inside a block.
> > - PATCH 8/16: Fix a temporary bug
> > - PATCH 9/16: Update the get_step_1x1 to get_step_next_block and update
> >   the documentation
> > - PATCH 10/16: Update to uses the new pixel accessors
> > - PATCH 10/16: Reword some comments
> > - PATCH 11/16: Update to use the new pixel accessors
> > - PATCH 11/16: Fix a bug in the subsampling offset for inverted reading
> >   (right to left/bottom to top). Found by [8].
> > - PATCH 11/16: Apply Arthur's modifications (comments, algorithm
> >   clarification)
> > - PATCH 11/16: Use the correct formatter for 4cc code
> > - PATCH 11/16: Update to use the new get_step_next_block
> > - PATCH 14/16: Apply Arthur's modification (comments, compilation issue)
> > - PATCH 15/16: Add Arthur's patch to explain the kunit tests
> > - PATCH 16/16: Introduce DRM_FORMAT_R* support.
> > - Link to v4: https://lore.kernel.org/r/20240304-yuv-v4-0-76beac8e9793@bootlin.com
> > Changes in v4:
> > - PATCH 3/14: Update comments for get_pixel_* functions
> > - PATCH 4/14: Add WARN when trying to get unsupported pixel_* functions
> > - PATCH 5/14: Create dummy pixel reader/writer to avoid NULL
> >   function pointers and kernel OOPS
> > - PATCH 6/14: Added the usage of const pointers when needed
> > - PATCH 7/14: Extraction of pixel accessors modification
> > - PATCH 8/14: Extraction of the blending function modification
> > - PATCH 9/14: Extraction of the pixel_read_direction enum
> > - PATCH 10/14: Update direction_for_rotation documentation
> > - PATCH 10/14: Rename conversion functions to be explicit
> > - PATCH 10/14: Replace while(count) by while(out_pixel<end) in read_line
> >   callbacks. It avoid a new variable+addition in the composition hot path.
> > - PATCH 11/14: Rename conversion functions to be explicit
> > - PATCH 11/14: Update the documentation for get_subsampling_offset
> > - PATCH 11/14: Add the matrix_conversion structure to remove a test from
> >   the hot path.
> > - PATCH 11/14: Upadate matrix values to use 32.32 fixed floats for
> >   conversion
> > - PATCH 12/14: Update commit message
> > - PATCH 14/14: Change kunit expected value
> > - Link to v3: https://lore.kernel.org/r/20240226-yuv-v3-0-ff662f0994db@bootlin.com
> > Changes in v3:
> > - Correction of remaining git-rebase artefacts
> > - Added Pekka in copy of this patch
> > - Link to v2: https://lore.kernel.org/r/20240223-yuv-v2-0-aa6be2827bb7@bootlin.com
> > Changes in v2:
> > - Rebased the series on top of drm-misc/drm-misc-net
> > - Extract the typedef for pixel_read/pixel_write
> > - Introduce the line-by-line algorithm per pixel format
> > - Add some documentation for existing and new code
> > - Port the series [1] to use line-by-line algorithm
> > - Link to v1: https://lore.kernel.org/r/20240201-yuv-v1-0-3ca376f27632@bootlin.com
> > ---
> > Arthur Grillo (1):
> >       drm/vkms: Use drm_frame directly
> > 
> > Louis Chauvet (8):
> >       drm/vkms: Code formatting
> >       drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions
> >       drm/vkms: Use const for input pointers in pixel_read an pixel_write functions
> >       drm/vkms: Update pixels accessor to support packed and multi-plane formats.
> >       drm/vkms: Avoid computing blending limits inside pre_mul_alpha_blend
> >       drm/vkms: Introduce pixel_read_direction enum
> >       drm/vkms: Re-introduce line-per-line composition algorithm
> >       drm/vkms: Remove useless drm_rotation_simplify
> > 
> >  drivers/gpu/drm/vkms/vkms_composer.c  | 312 ++++++++++++++++++++------
> >  drivers/gpu/drm/vkms/vkms_crtc.c      |   6 +-
> >  drivers/gpu/drm/vkms/vkms_drv.c       |   3 +-
> >  drivers/gpu/drm/vkms/vkms_drv.h       |  55 ++++-
> >  drivers/gpu/drm/vkms/vkms_formats.c   | 409 ++++++++++++++++++++++++----------
> >  drivers/gpu/drm/vkms/vkms_formats.h   |   4 +-
> >  drivers/gpu/drm/vkms/vkms_plane.c     |  17 +-
> >  drivers/gpu/drm/vkms/vkms_writeback.c |   5 -
> >  8 files changed, 588 insertions(+), 223 deletions(-)
> > ---
> > base-commit: 623b1e4d2eace0958996995f9f88cb659a6f69dd
> > change-id: 20240201-yuv-1337d90d9576
> > 
> > Best regards,
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats.
  2024-11-18 17:10   ` José Expósito
@ 2024-11-18 17:17     ` Louis Chauvet
  2024-11-18 17:24       ` José Expósito
  0 siblings, 1 reply; 20+ messages in thread
From: Louis Chauvet @ 2024-11-18 17:17 UTC (permalink / raw)
  To: José Expósito
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen, rdunlap,
	rodrigosiqueiramelo, seanpaul, simona.vetter, simona,
	thomas.petazzoni, tzimmermann

On 18/11/24 - 18:10, José Expósito wrote:
> > Introduce the usage of block_h/block_w to compute the offset and the
> > pointer of a pixel. The previous implementation was specialized for
> > planes with block_h == block_w == 1. To avoid confusion and allow easier
> > implementation of tiled formats. It also remove the usage of the
> > deprecated format field `cpp`.
> > 
> > Introduce the plane_index parameter to get an offset/pointer on a
> > different plane.
> > 
> > Acked-by: Maíra Canal <mairacanal@riseup.net>
> > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > ---
> >  drivers/gpu/drm/vkms/vkms_formats.c | 114 ++++++++++++++++++++++++++++--------
> >  1 file changed, 91 insertions(+), 23 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > index 06aef5162529..7f932d42394d 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > @@ -10,22 +10,46 @@
> >  #include "vkms_formats.h"
> >  
> >  /**
> > - * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane
> > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> >   *
> >   * @frame_info: Buffer metadata
> >   * @x: The x coordinate of the wanted pixel in the buffer
> >   * @y: The y coordinate of the wanted pixel in the buffer
> > + * @plane_index: The index of the plane to use
> > + * @offset: The returned offset inside the buffer of the block
> 
> The previous function (pixel_offset) returned a size_t for the offset rather
> than an int. Do you know if we are safe using an int in this case?

I think I used int everywhere because it may avoid strange issues with 
implicit casting and negative number. I don't remember exactly where, but 
Pekka suggested it.
 
> > + * @rem_x: The returned X coordinate of the requested pixel in the block
> > + * @rem_y: The returned Y coordinate of the requested pixel in the block
> >   *
> > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > - * where block_h == block_w == 1.
> > - * If this requirement is not fulfilled, the resulting offset can point to an other pixel or
> > - * outside of the buffer.
> > + * As some pixel formats store multiple pixels in a block (DRM_FORMAT_R* for example), some
> > + * pixels are not individually addressable. This function return 3 values: the offset of the
> > + * whole block, and the coordinate of the requested pixel inside this block.
> > + * For example, if the format is DRM_FORMAT_R1 and the requested coordinate is 13,5, the offset
> > + * will point to the byte 5*pitches + 13/8 (second byte of the 5th line), and the rem_x/rem_y
> > + * coordinates will be (13 % 8, 5 % 1) = (5, 0)
> > + *
> > + * With this function, the caller just have to extract the correct pixel from the block.
> >   */
> > -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > +static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> > +				 int plane_index, int *offset, int *rem_x, int *rem_y)
> >  {
> >  	struct drm_framebuffer *fb = frame_info->fb;
> > +	const struct drm_format_info *format = frame_info->fb->format;
> > +	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> > +	 * in some formats a block can represent multiple pixels.
> > +	 *
> > +	 * Dividing x and y by the block size allows to extract the correct offset of the block
> > +	 * containing the pixel.
> > +	 */
> >  
> > -	return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
> > +	int block_x = x / drm_format_info_block_width(format, plane_index);
> > +	int block_y = y / drm_format_info_block_height(format, plane_index);
> > +	int block_pitch = fb->pitches[plane_index] * drm_format_info_block_height(format,
> > +										  plane_index);
> > +	*rem_x = x % drm_format_info_block_width(format, plane_index);
> > +	*rem_y = y % drm_format_info_block_height(format, plane_index);
> > +	*offset = fb->offsets[plane_index] +
> > +		  block_y * block_pitch +
> > +		  block_x * format->char_per_block[plane_index];
> >  }
> >  
> >  /**
> > @@ -35,30 +59,71 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> >   * @frame_info: Buffer metadata
> >   * @x: The x (width) coordinate inside the plane
> >   * @y: The y (height) coordinate inside the plane
> > + * @plane_index: The index of the plane
> > + * @addr: The returned pointer
> > + * @rem_x: The returned X coordinate of the requested pixel in the block
> > + * @rem_y: The returned Y coordinate of the requested pixel in the block
> >   *
> > - * Takes the information stored in the frame_info, a pair of coordinates, and
> > - * returns the address of the first color channel.
> > - * This function assumes the channels are packed together, i.e. a color channel
> > - * comes immediately after another in the memory. And therefore, this function
> > - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> > + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> > + * of the block containing this pixel and the pixel position inside this block.
> >   *
> > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > - * where block_h == block_w == 1, otherwise the returned pointer can be outside the buffer.
> > + * See @packed_pixel_offset for details about rem_x/rem_y behavior.
> 
> Missing "s" in the name of the function. Should read "@packed_pixels_offset".

Thanks!

> >   */
> > -static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > -				int x, int y)
> > +static void packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > +			       int x, int y, int plane_index, u8 **addr, int *rem_x,
> > +			       int *rem_y)
> >  {
> > -	size_t offset = pixel_offset(frame_info, x, y);
> > +	int offset;
> >  
> > -	return (u8 *)frame_info->map[0].vaddr + offset;
> > +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, rem_x, rem_y);
> > +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> >  }
> >  
> > -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > +/**
> > + * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given
> > + * coordinates
> > + *
> > + * @frame_info: Buffer metadata
> > + * @x: The x (width) coordinate inside the plane
> > + * @y: The y (height) coordinate inside the plane
> > + * @plane_index: The index of the plane
> > + * @addr: The returned pointer
> > + *
> > + * This function can only be used with format where block_h == block_w == 1.
> > + */
> > +static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
> > +				   int x, int y, int plane_index, u8 **addr)
> > +{
> > +	int offset, rem_x, rem_y;
> 
> Nitpick, but it'd be nice if packed_pixels_offset() could take NULLs in
> the output values so we avoid declaring unused variables here and when
> calling packed_pixels_addr().

It is not a trivial change, and as I want this series to be merged I will 
send the v14 without it. But if I have the time I will send a new 
patch/series with this cleanup, thanks for the suggestion.

> > +
> > +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format,
> > +					      plane_index) != 1,
> > +		"%s() only support formats with block_w == 1", __func__);
> > +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format,
> > +					       plane_index) != 1,
> > +		"%s() only support formats with block_h == 1", __func__);
> > +
> > +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, &rem_x,
> > +			     &rem_y);
> > +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> > +}
> > +
> > +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
> > +				 int plane_index)
> >  {
> >  	int x_src = frame_info->src.x1 >> 16;
> >  	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > +	u8 *addr;
> > +	int rem_x, rem_y;
> > +
> > +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
> > +		  "%s() only support formats with block_w == 1", __func__);
> > +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
> > +		  "%s() only support formats with block_h == 1", __func__);
> >  
> > -	return packed_pixels_addr(frame_info, x_src, y_src);
> > +	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
> > +
> > +	return addr;
> >  }
> >  
> >  static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > @@ -152,14 +217,14 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
> >  {
> >  	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> >  	struct vkms_frame_info *frame_info = plane->frame_info;
> > -	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > +	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
> >  	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> >  
> >  	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> >  		int x_pos = get_x_position(frame_info, limit, x);
> >  
> >  		if (drm_rotation_90_or_270(frame_info->rotation))
> > -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> > +			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
> >  				+ frame_info->fb->format->cpp[0] * y;
> >  
> >  		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> > @@ -250,7 +315,10 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> >  {
> >  	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
> >  	int x_dst = frame_info->dst.x1;
> > -	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > +	u8 *dst_pixels;
> > +	int rem_x, rem_y;
> > +
> > +	packed_pixels_addr(frame_info, x_dst, y, 0, &dst_pixels, &rem_x, &rem_y);
> >  	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> >  	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
> >  
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v13 8/9] drm/vkms: Re-introduce line-per-line composition algorithm
  2024-11-18 17:10   ` José Expósito
@ 2024-11-18 17:19     ` Louis Chauvet
  0 siblings, 0 replies; 20+ messages in thread
From: Louis Chauvet @ 2024-11-18 17:19 UTC (permalink / raw)
  To: José Expósito
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen,
	pekka.paalanen, rdunlap, rodrigosiqueiramelo, seanpaul,
	simona.vetter, simona, thomas.petazzoni, tzimmermann

On 18/11/24 - 18:10, José Expósito wrote:
> > Re-introduce a line-by-line composition algorithm for each pixel format.
> > This allows more performance by not requiring an indirection per pixel
> > read. This patch is focused on readability of the code.
> > 
> > Line-by-line composition was introduced by [1] but rewritten back to
> > pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact
> > on performance, and it was merged.
> > 
> > This patch is almost a revert of [2], but in addition efforts have been
> > made to increase readability and maintainability of the rotation handling.
> > The blend function is now divided in two parts:
> > - Transformation of coordinates from the output referential to the source
> > referential
> > - Line conversion and blending
> > 
> > Most of the complexity of the rotation management is avoided by using
> > drm_rect_* helpers. The remaining complexity is around the clipping, to
> > avoid reading/writing outside source/destination buffers.
> > 
> > The pixel conversion is now done line-by-line, so the read_pixel_t was
> > replaced with read_pixel_line_t callback. This way the indirection is only
> > required once per line and per plane, instead of once per pixel and per
> > plane.
> > 
> > The read_line_t callbacks are very similar for most pixel format, but it
> > is required to avoid performance impact. Some helpers for color
> > conversion were introduced to avoid code repetition:
> > - *_to_argb_u16: perform colors conversion. They should be inlined by the
> >   compiler, and they are used to avoid repetition between multiple variants
> >   of the same format (argb/xrgb and maybe in the future for formats like
> >   bgr formats).
> > 
> > This new algorithm was tested with:
> > - kms_plane (for color conversions)
> > - kms_rotation_crc (for rotations of planes)
> > - kms_cursor_crc (for translations of planes)
> > - kms_rotation (for all rotations and formats combinations) [3]
> > The performance gain was mesured with kms_fb_stress [4] with some
> > modification to fix the writeback format.
> > 
> > The performance improvement is around 5 to 10%.
> > 
> > [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept
> >      new formats")
> >      https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/
> > [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion
> >      functionality")
> >      https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/
> > [3]: https://lore.kernel.org/igt-dev/20240313-new_rotation-v2-0-6230fd5cae59@bootlin.com/
> > [4]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc88@riseup.net/
> > 
> > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
> > 
> > # Conflicts:
> > #	drivers/gpu/drm/vkms/vkms_composer.c
> > 
> > Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
> 
> checkpatch will complaint about this duplicated signature.

An other reason for a v14 :-)
I saw it when I tried to commit this series, thanks
 
> > ---
> >  drivers/gpu/drm/vkms/vkms_composer.c | 234 ++++++++++++++++++++++++++++-------
> >  drivers/gpu/drm/vkms/vkms_drv.h      |  28 +++--
> >  drivers/gpu/drm/vkms/vkms_formats.c  | 224 ++++++++++++++++++++-------------
> >  drivers/gpu/drm/vkms/vkms_formats.h  |   2 +-
> >  drivers/gpu/drm/vkms/vkms_plane.c    |   5 +-
> >  5 files changed, 344 insertions(+), 149 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > index 601e33431b45..7a3e47b895a7 100644
> > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > @@ -29,8 +29,8 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
> >   * @x_start: The start offset
> >   * @pixel_count: The number of pixels to blend
> >   *
> > - * The pixels [0;@pixel_count) in stage_buffer are blended at [@x_start;@x_start+@pixel_count) in
> > - * output_buffer.
> > + * The pixels [@x_start;@x_start+@pixel_count) in stage_buffer are blended at
> > + * [@x_start;@x_start+@pixel_count) in output_buffer.
> >   *
> >   * The current DRM assumption is that pixel color values have been already
> >   * pre-multiplied with the alpha channel values. See more
> > @@ -41,7 +41,7 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer,
> >  				struct line_buffer *output_buffer, int x_start, int pixel_count)
> >  {
> >  	struct pixel_argb_u16 *out = &output_buffer->pixels[x_start];
> > -	const struct pixel_argb_u16 *in = stage_buffer->pixels;
> > +	const struct pixel_argb_u16 *in = &stage_buffer->pixels[x_start];
> >  
> >  	for (int i = 0; i < pixel_count; i++) {
> >  		out[i].a = (u16)0xffff;
> > @@ -51,33 +51,6 @@ static void pre_mul_alpha_blend(const struct line_buffer *stage_buffer,
> >  	}
> >  }
> >  
> > -static int get_y_pos(struct vkms_frame_info *frame_info, int y)
> > -{
> > -	if (frame_info->rotation & DRM_MODE_REFLECT_Y)
> > -		return drm_rect_height(&frame_info->rotated) - y - 1;
> > -
> > -	switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) {
> > -	case DRM_MODE_ROTATE_90:
> > -		return frame_info->rotated.x2 - y - 1;
> > -	case DRM_MODE_ROTATE_270:
> > -		return y + frame_info->rotated.x1;
> > -	default:
> > -		return y;
> > -	}
> > -}
> > -
> > -static bool check_limit(struct vkms_frame_info *frame_info, int pos)
> > -{
> > -	if (drm_rotation_90_or_270(frame_info->rotation)) {
> > -		if (pos >= 0 && pos < drm_rect_width(&frame_info->rotated))
> > -			return true;
> > -	} else {
> > -		if (pos >= frame_info->rotated.y1 && pos < frame_info->rotated.y2)
> > -			return true;
> > -	}
> > -
> > -	return false;
> > -}
> >  
> >  static void fill_background(const struct pixel_argb_u16 *background_color,
> >  			    struct line_buffer *output_buffer)
> > @@ -203,6 +176,182 @@ static enum pixel_read_direction direction_for_rotation(unsigned int rotation)
> >  	return READ_LEFT_TO_RIGHT;
> >  }
> >  
> > +/**
> > + * clamp_line_coordinates() - Compute and clamp the coordinate to read and write during the blend
> > + * process.
> > + *
> > + * @direction: direction of the reading
> > + * @current_plane: current plane blended
> > + * @src_line: source line of the reading. Only the top-left coordinate is used. This rectangle
> > + * must be rotated and have a shape of 1*pixel_count if @direction is vertical and a shape of
> > + * pixel_count*1 if @direction is horizontal.
> > + * @src_x_start: x start coordinate for the line reading
> > + * @src_y_start: y start coordinate for the line reading
> > + * @dst_x_start: x coordinate to blend the read line
> > + * @pixel_count: number of pixels to blend
> > + *
> > + * This function is mainly a safety net to avoid reading outside the source buffer. As the
> > + * userspace should never ask to read outside the source plane, all the cases covered here should
> > + * be dead code.
> > + */
> > +static void clamp_line_coordinates(enum pixel_read_direction direction,
> > +				   const struct vkms_plane_state *current_plane,
> > +				   const struct drm_rect *src_line, int *src_x_start,
> > +				   int *src_y_start, int *dst_x_start, int *pixel_count)
> > +{
> > +	/* By default the start points are correct */
> > +	*src_x_start = src_line->x1;
> > +	*src_y_start = src_line->y1;
> > +	*dst_x_start = current_plane->frame_info->dst.x1;
> > +
> > +	/* Get the correct number of pixel to blend, it depends of the direction */
> > +	switch (direction) {
> > +	case READ_LEFT_TO_RIGHT:
> > +	case READ_RIGHT_TO_LEFT:
> > +		*pixel_count = drm_rect_width(src_line);
> > +		break;
> > +	case READ_BOTTOM_TO_TOP:
> > +	case READ_TOP_TO_BOTTOM:
> > +		*pixel_count = drm_rect_height(src_line);
> > +		break;
> > +	}
> > +
> > +	/*
> > +	 * Clamp the coordinates to avoid reading outside the buffer
> > +	 *
> > +	 * This is mainly a security check to avoid reading outside the buffer, the userspace
> > +	 * should never request to read outside the source buffer.
> > +	 */
> > +	switch (direction) {
> > +	case READ_LEFT_TO_RIGHT:
> > +	case READ_RIGHT_TO_LEFT:
> > +		if (*src_x_start < 0) {
> > +			*pixel_count += *src_x_start;
> > +			*dst_x_start -= *src_x_start;
> > +			*src_x_start = 0;
> > +		}
> > +		if (*src_x_start + *pixel_count > current_plane->frame_info->fb->width)
> > +			*pixel_count = max(0, (int)current_plane->frame_info->fb->width -
> > +				*src_x_start);
> > +		break;
> > +	case READ_BOTTOM_TO_TOP:
> > +	case READ_TOP_TO_BOTTOM:
> > +		if (*src_y_start < 0) {
> > +			*pixel_count += *src_y_start;
> > +			*dst_x_start -= *src_y_start;
> > +			*src_y_start = 0;
> > +		}
> > +		if (*src_y_start + *pixel_count > current_plane->frame_info->fb->height)
> > +			*pixel_count = max(0, (int)current_plane->frame_info->fb->height -
> > +				*src_y_start);
> > +		break;
> > +	}
> > +}
> > +
> > +/**
> > + * blend_line() - Blend a line from a plane to the output buffer
> > + *
> > + * @current_plane: current plane to work on
> > + * @y: line to write in the output buffer
> > + * @crtc_x_limit: width of the output buffer
> > + * @stage_buffer: temporary buffer to convert the pixel line from the source buffer
> > + * @output_buffer: buffer to blend the read line into.
> > + */
> > +static void blend_line(struct vkms_plane_state *current_plane, int y,
> > +		       int crtc_x_limit, struct line_buffer *stage_buffer,
> > +		       struct line_buffer *output_buffer)
> > +{
> > +	int src_x_start, src_y_start, dst_x_start, pixel_count;
> > +	struct drm_rect dst_line, tmp_src, src_line;
> > +
> > +	/* Avoid rendering useless lines */
> > +	if (y < current_plane->frame_info->dst.y1 ||
> > +	    y >= current_plane->frame_info->dst.y2)
> > +		return;
> > +
> > +	/*
> > +	 * dst_line is the line to copy. The initial coordinates are inside the
> > +	 * destination framebuffer, and then drm_rect_* helpers are used to
> > +	 * compute the correct position into the source framebuffer.
> > +	 */
> > +	dst_line = DRM_RECT_INIT(current_plane->frame_info->dst.x1, y,
> > +				 drm_rect_width(&current_plane->frame_info->dst),
> > +				 1);
> > +
> > +	drm_rect_fp_to_int(&tmp_src, &current_plane->frame_info->src);
> > +
> > +	/*
> > +	 * [1]: Clamping src_line to the crtc_x_limit to avoid writing outside of
> > +	 * the destination buffer
> > +	 */
> > +	dst_line.x1 = max_t(int, dst_line.x1, 0);
> > +	dst_line.x2 = min_t(int, dst_line.x2, crtc_x_limit);
> > +	/* The destination is completely outside of the crtc. */
> > +	if (dst_line.x2 <= dst_line.x1)
> > +		return;
> > +
> > +	src_line = dst_line;
> > +
> > +	/*
> > +	 * Transform the coordinate x/y from the crtc to coordinates into
> > +	 * coordinates for the src buffer.
> > +	 *
> > +	 * - Cancel the offset of the dst buffer.
> > +	 * - Invert the rotation. This assumes that
> > +	 *   dst = drm_rect_rotate(src, rotation) (dst and src have the
> > +	 *   same size, but can be rotated).
> > +	 * - Apply the offset of the source rectangle to the coordinate.
> > +	 */
> > +	drm_rect_translate(&src_line, -current_plane->frame_info->dst.x1,
> > +			   -current_plane->frame_info->dst.y1);
> > +	drm_rect_rotate_inv(&src_line, drm_rect_width(&tmp_src),
> > +			    drm_rect_height(&tmp_src),
> > +			    current_plane->frame_info->rotation);
> > +	drm_rect_translate(&src_line, tmp_src.x1, tmp_src.y1);
> > +
> > +	/* Get the correct reading direction in the source buffer. */
> > +
> > +	enum pixel_read_direction direction =
> > +		direction_for_rotation(current_plane->frame_info->rotation);
> > +
> > +	/* [2]: Compute and clamp the number of pixel to read */
> > +	clamp_line_coordinates(direction, current_plane, &src_line, &src_x_start, &src_y_start,
> > +			       &dst_x_start, &pixel_count);
> > +
> > +	if (pixel_count <= 0) {
> > +		/* Nothing to read, so avoid multiple function calls */
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Modify the starting point to take in account the rotation
> > +	 *
> > +	 * src_line is the top-left corner, so when reading READ_RIGHT_TO_LEFT or
> > +	 * READ_BOTTOM_TO_TOP, it must be changed to the top-right/bottom-left
> > +	 * corner.
> > +	 */
> > +	if (direction == READ_RIGHT_TO_LEFT) {
> > +		// src_x_start is now the right point
> > +		src_x_start += pixel_count - 1;
> > +	} else if (direction == READ_BOTTOM_TO_TOP) {
> > +		// src_y_start is now the bottom point
> > +		src_y_start += pixel_count - 1;
> > +	}
> > +
> > +	/*
> > +	 * Perform the conversion and the blending
> > +	 *
> > +	 * Here we know that the read line (x_start, y_start, pixel_count) is
> > +	 * inside the source buffer [2] and we don't write outside the stage
> > +	 * buffer [1].
> > +	 */
> > +	current_plane->pixel_read_line(current_plane, src_x_start, src_y_start, direction,
> > +				       pixel_count, &stage_buffer->pixels[dst_x_start]);
> > +
> > +	pre_mul_alpha_blend(stage_buffer, output_buffer,
> > +			    dst_x_start, pixel_count);
> > +}
> > +
> >  /**
> >   * blend - blend the pixels from all planes and compute crc
> >   * @wb: The writeback frame buffer metadata
> > @@ -223,34 +372,25 @@ static void blend(struct vkms_writeback_job *wb,
> >  {
> >  	struct vkms_plane_state **plane = crtc_state->active_planes;
> >  	u32 n_active_planes = crtc_state->num_active_planes;
> > -	int y_pos, x_dst, pixel_count;
> >  
> >  	const struct pixel_argb_u16 background_color = { .a = 0xffff };
> >  
> > -	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > +	int crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> > +	int crtc_x_limit = crtc_state->base.crtc->mode.hdisplay;
> 
> I'm sure you are already aware of the tiny conflict with:
> https://lore.kernel.org/all/20241003-remove-legacy-v1-1-0b7db1f1a1a6@bootlin.com/
> 
> This is now:
> -	size_t crtc_y_limit = crtc_state->base.mode.vdisplay;
> +	int crtc_y_limit = crtc_state->base.mode.vdisplay;
> +	int crtc_x_limit = crtc_state->base.mode.hdisplay;
> 

Yes, this is the main reason for v14... I was dumb and sent the v13 before 
commiting the remove-legacy series... Next time I will commit and then 
rebase, not the oposite :-)

> >  
> >  	/*
> >  	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> >  	 * complexity to avoid poor blending performance.
> >  	 *
> > -	 * The function vkms_compose_row() is used to read a line, pixel-by-pixel, into the staging
> > -	 * buffer.
> > +	 * The function pixel_read_line callback is used to read a line, using an efficient
> > +	 * algorithm for a specific format, into the staging buffer.
> >  	 */
> > -	for (size_t y = 0; y < crtc_y_limit; y++) {
> > +	for (int y = 0; y < crtc_y_limit; y++) {
> >  		fill_background(&background_color, output_buffer);
> >  
> >  		/* The active planes are composed associatively in z-order. */
> >  		for (size_t i = 0; i < n_active_planes; i++) {
> > -			x_dst = plane[i]->frame_info->dst.x1;
> > -			pixel_count = min_t(int, drm_rect_width(&plane[i]->frame_info->dst),
> > -					    (int)stage_buffer->n_pixels);
> > -			y_pos = get_y_pos(plane[i]->frame_info, y);
> > -
> > -			if (!check_limit(plane[i]->frame_info, y_pos))
> > -				continue;
> > -
> > -			vkms_compose_row(stage_buffer, plane[i], y_pos);
> > -			pre_mul_alpha_blend(stage_buffer, output_buffer, x_dst, pixel_count);
> > +			blend_line(plane[i], y, crtc_x_limit, stage_buffer, output_buffer);
> >  		}
> >  
> >  		apply_lut(crtc_state, output_buffer);
> > @@ -258,7 +398,7 @@ static void blend(struct vkms_writeback_job *wb,
> >  		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
> >  
> >  		if (wb)
> > -			vkms_writeback_row(wb, output_buffer, y_pos);
> > +			vkms_writeback_row(wb, output_buffer, y);
> >  	}
> >  }
> >  
> > @@ -269,7 +409,7 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state,
> >  	u32 n_active_planes = crtc_state->num_active_planes;
> >  
> >  	for (size_t i = 0; i < n_active_planes; i++)
> > -		if (!planes[i]->pixel_read)
> > +		if (!planes[i]->pixel_read_line)
> >  			return -1;
> >  
> >  	if (active_wb && !active_wb->pixel_write)
> > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > index 777b7bd91f27..067a4797f7a0 100644
> > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > @@ -39,7 +39,6 @@
> >  struct vkms_frame_info {
> >  	struct drm_framebuffer *fb;
> >  	struct drm_rect src, dst;
> > -	struct drm_rect rotated;
> >  	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
> >  	unsigned int rotation;
> >  };
> > @@ -80,26 +79,38 @@ enum pixel_read_direction {
> >  	READ_LEFT_TO_RIGHT
> >  };
> >  
> > +struct vkms_plane_state;
> > +
> >  /**
> > - * typedef pixel_read_t - These functions are used to read a pixel in the source frame,
> > + * typedef pixel_read_line_t - These functions are used to read a pixel line in the source frame,
> >   * convert it to `struct pixel_argb_u16` and write it to @out_pixel.
> >   *
> > - * @in_pixel: pointer to the pixel to read
> > - * @out_pixel: pointer to write the converted pixel
> > + * @plane: plane used as source for the pixel value
> > + * @x_start: X (width) coordinate of the first pixel to copy. The caller must ensure that x_start
> > + * is non-negative and smaller than @plane->frame_info->fb->width.
> > + * @y_start: Y (height) coordinate of the first pixel to copy. The caller must ensure that y_start
> > + * is non-negative and smaller than @plane->frame_info->fb->height.
> > + * @direction: direction to use for the copy, starting at @x_start/@y_start
> > + * @count: number of pixels to copy
> > + * @out_pixel: pointer where to write the pixel values. They will be written from @out_pixel[0]
> > + * (included) to @out_pixel[@count] (excluded). The caller must ensure that out_pixel have a
> > + * length of at least @count.
> >   */
> > -typedef void (*pixel_read_t)(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel);
> > +typedef void (*pixel_read_line_t)(const struct vkms_plane_state *plane, int x_start,
> > +				  int y_start, enum pixel_read_direction direction, int count,
> > +				  struct pixel_argb_u16 out_pixel[]);
> >  
> >  /**
> >   * struct vkms_plane_state - Driver specific plane state
> >   * @base: base plane state
> >   * @frame_info: data required for composing computation
> > - * @pixel_read: function to read a pixel in this plane. The creator of a struct vkms_plane_state
> > - *	        must ensure that this pointer is valid
> > + * @pixel_read_line: function to read a pixel line in this plane. The creator of a
> > + *		     struct vkms_plane_state must ensure that this pointer is valid
> >   */
> >  struct vkms_plane_state {
> >  	struct drm_shadow_plane_state base;
> >  	struct vkms_frame_info *frame_info;
> > -	pixel_read_t pixel_read;
> > +	pixel_read_line_t pixel_read_line;
> >  };
> >  
> >  struct vkms_plane {
> > @@ -265,7 +276,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const char *source_name,
> >  /* Composer Support */
> >  void vkms_composer_worker(struct work_struct *work);
> >  void vkms_set_composer(struct vkms_output *out, bool enabled);
> > -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y);
> >  void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer *src_buffer, int y);
> >  
> >  /* Writeback */
> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > index d0e7dfc1f0d3..0f6678420a11 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > @@ -140,83 +140,51 @@ static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
> >  	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> >  }
> >  
> > -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
> > -				 int plane_index)
> > -{
> > -	int x_src = frame_info->src.x1 >> 16;
> > -	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > -	u8 *addr;
> > -	int rem_x, rem_y;
> > -
> > -	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
> > -		  "%s() only support formats with block_w == 1", __func__);
> > -	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
> > -		  "%s() only support formats with block_h == 1", __func__);
> > -
> > -	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
> > -
> > -	return addr;
> > -}
> > -
> > -static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > -{
> > -	if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270))
> > -		return limit - x - 1;
> > -	return x;
> > -}
> > -
> >  /*
> > - * The following functions take pixel data from the buffer and convert them to the format
> > - * ARGB16161616 in @out_pixel.
> > + * The following functions take pixel data (a, r, g, b, pixel, ...) and convert them to
> > + * &struct pixel_argb_u16
> >   *
> > - * They are used in the vkms_compose_row() function to handle multiple formats.
> > + * They are used in the `read_line`s functions to avoid duplicate work for some pixel formats.
> >   */
> >  
> > -static void ARGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> > +static struct pixel_argb_u16 argb_u16_from_u8888(u8 a, u8 r, u8 g, u8 b)
> >  {
> > +	struct pixel_argb_u16 out_pixel;
> >  	/*
> >  	 * The 257 is the "conversion ratio". This number is obtained by the
> >  	 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> >  	 * the best color value in a pixel format with more possibilities.
> >  	 * A similar idea applies to others RGB color conversions.
> >  	 */
> > -	out_pixel->a = (u16)in_pixel[3] * 257;
> > -	out_pixel->r = (u16)in_pixel[2] * 257;
> > -	out_pixel->g = (u16)in_pixel[1] * 257;
> > -	out_pixel->b = (u16)in_pixel[0] * 257;
> > -}
> > +	out_pixel.a = (u16)a * 257;
> > +	out_pixel.r = (u16)r * 257;
> > +	out_pixel.g = (u16)g * 257;
> > +	out_pixel.b = (u16)b * 257;
> >  
> > -static void XRGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> > -{
> > -	out_pixel->a = (u16)0xffff;
> > -	out_pixel->r = (u16)in_pixel[2] * 257;
> > -	out_pixel->g = (u16)in_pixel[1] * 257;
> > -	out_pixel->b = (u16)in_pixel[0] * 257;
> > +	return out_pixel;
> >  }
> >  
> > -static void ARGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> > +static struct pixel_argb_u16 argb_u16_from_u16161616(u16 a, u16 r, u16 g, u16 b)
> >  {
> > -	__le16 *pixel = (__le16 *)in_pixel;
> > +	struct pixel_argb_u16 out_pixel;
> > +
> > +	out_pixel.a = a;
> > +	out_pixel.r = r;
> > +	out_pixel.g = g;
> > +	out_pixel.b = b;
> >  
> > -	out_pixel->a = le16_to_cpu(pixel[3]);
> > -	out_pixel->r = le16_to_cpu(pixel[2]);
> > -	out_pixel->g = le16_to_cpu(pixel[1]);
> > -	out_pixel->b = le16_to_cpu(pixel[0]);
> > +	return out_pixel;
> >  }
> >  
> > -static void XRGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> > +static struct pixel_argb_u16 argb_u16_from_le16161616(__le16 a, __le16 r, __le16 g, __le16 b)
> >  {
> > -	__le16 *pixel = (__le16 *)in_pixel;
> > -
> > -	out_pixel->a = (u16)0xffff;
> > -	out_pixel->r = le16_to_cpu(pixel[2]);
> > -	out_pixel->g = le16_to_cpu(pixel[1]);
> > -	out_pixel->b = le16_to_cpu(pixel[0]);
> > +	return argb_u16_from_u16161616(le16_to_cpu(a), le16_to_cpu(r), le16_to_cpu(g),
> > +				       le16_to_cpu(b));
> >  }
> >  
> > -static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pixel)
> > +static struct pixel_argb_u16 argb_u16_from_RGB565(const __le16 *pixel)
> >  {
> > -	__le16 *pixel = (__le16 *)in_pixel;
> > +	struct pixel_argb_u16 out_pixel;
> >  
> >  	s64 fp_rb_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31));
> >  	s64 fp_g_ratio = drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63));
> > @@ -226,40 +194,120 @@ static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *out_pi
> >  	s64 fp_g = drm_int2fixp((rgb_565 >> 5) & 0x3f);
> >  	s64 fp_b = drm_int2fixp(rgb_565 & 0x1f);
> >  
> > -	out_pixel->a = (u16)0xffff;
> > -	out_pixel->r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio));
> > -	out_pixel->g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio));
> > -	out_pixel->b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> > +	out_pixel.a = (u16)0xffff;
> > +	out_pixel.r = drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio));
> > +	out_pixel.g = drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio));
> > +	out_pixel.b = drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio));
> > +
> > +	return out_pixel;
> >  }
> >  
> > -/**
> > - * vkms_compose_row - compose a single row of a plane
> > - * @stage_buffer: output line with the composed pixels
> > - * @plane: state of the plane that is being composed
> > - * @y: y coordinate of the row
> > +/*
> > + * The following functions are read_line function for each pixel format supported by VKMS.
> > + *
> > + * They read a line starting at the point @x_start,@y_start following the @direction. The result
> > + * is stored in @out_pixel and in the format ARGB16161616.
> > + *
> > + * These functions are very repetitive, but the innermost pixel loops must be kept inside these
> > + * functions for performance reasons. Some benchmarking was done in [1] where having the innermost
> > + * loop factored out of these functions showed a slowdown by a factor of three.
> >   *
> > - * This function composes a single row of a plane. It gets the source pixels
> > - * through the y coordinate (see get_packed_src_addr()) and goes linearly
> > - * through the source pixel, reading the pixels and converting it to
> > - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-270,
> > - * the source pixels are not traversed linearly. The source pixels are queried
> > - * on each iteration in order to traverse the pixels vertically.
> > + * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33b3a3@riseup.net/
> >   */
> > -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state *plane, int y)
> > +
> > +static void ARGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start,
> > +			       enum pixel_read_direction direction, int count,
> > +			       struct pixel_argb_u16 out_pixel[])
> >  {
> > -	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> > -	struct vkms_frame_info *frame_info = plane->frame_info;
> > -	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
> > -	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> > +	struct pixel_argb_u16 *end = out_pixel + count;
> > +	u8 *src_pixels;
> > +
> > +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> > +
> > +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> > +
> > +	while (out_pixel < end) {
> > +		u8 *px = (u8 *)src_pixels;
> > +		*out_pixel = argb_u16_from_u8888(px[3], px[2], px[1], px[0]);
> > +		out_pixel += 1;
> > +		src_pixels += step;
> > +	}
> > +}
> > +
> > +static void XRGB8888_read_line(const struct vkms_plane_state *plane, int x_start, int y_start,
> > +			       enum pixel_read_direction direction, int count,
> > +			       struct pixel_argb_u16 out_pixel[])
> > +{
> > +	struct pixel_argb_u16 *end = out_pixel + count;
> > +	u8 *src_pixels;
> > +
> > +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> > +
> > +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> > +
> > +	while (out_pixel < end) {
> > +		u8 *px = (u8 *)src_pixels;
> > +		*out_pixel = argb_u16_from_u8888(255, px[2], px[1], px[0]);
> > +		out_pixel += 1;
> > +		src_pixels += step;
> > +	}
> > +}
> > +
> > +static void ARGB16161616_read_line(const struct vkms_plane_state *plane, int x_start,
> > +				   int y_start, enum pixel_read_direction direction, int count,
> > +				   struct pixel_argb_u16 out_pixel[])
> > +{
> > +	struct pixel_argb_u16 *end = out_pixel + count;
> > +	u8 *src_pixels;
> > +
> > +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> > +
> > +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> > +
> > +	while (out_pixel < end) {
> > +		u16 *px = (u16 *)src_pixels;
> > +		*out_pixel = argb_u16_from_u16161616(px[3], px[2], px[1], px[0]);
> > +		out_pixel += 1;
> > +		src_pixels += step;
> > +	}
> > +}
> > +
> > +static void XRGB16161616_read_line(const struct vkms_plane_state *plane, int x_start,
> > +				   int y_start, enum pixel_read_direction direction, int count,
> > +				   struct pixel_argb_u16 out_pixel[])
> > +{
> > +	struct pixel_argb_u16 *end = out_pixel + count;
> > +	u8 *src_pixels;
> > +
> > +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> > +
> > +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> > +
> > +	while (out_pixel < end) {
> > +		__le16 *px = (__le16 *)src_pixels;
> > +		*out_pixel = argb_u16_from_le16161616(cpu_to_le16(0xFFFF), px[2], px[1], px[0]);
> > +		out_pixel += 1;
> > +		src_pixels += step;
> > +	}
> > +}
> > +
> > +static void RGB565_read_line(const struct vkms_plane_state *plane, int x_start,
> > +			     int y_start, enum pixel_read_direction direction, int count,
> > +			     struct pixel_argb_u16 out_pixel[])
> > +{
> > +	struct pixel_argb_u16 *end = out_pixel + count;
> > +	u8 *src_pixels;
> > +
> > +	packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixels);
> >  
> > -	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> > -		int x_pos = get_x_position(frame_info, limit, x);
> > +	int step = get_block_step_bytes(plane->frame_info->fb, direction, 0);
> >  
> > -		if (drm_rotation_90_or_270(frame_info->rotation))
> > -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
> > -				+ frame_info->fb->format->cpp[0] * y;
> > +	while (out_pixel < end) {
> > +		__le16 *px = (__le16 *)src_pixels;
> >  
> > -		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> > +		*out_pixel = argb_u16_from_RGB565(px);
> > +		out_pixel += 1;
> > +		src_pixels += step;
> >  	}
> >  }
> >  
> > @@ -359,25 +407,25 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> >  }
> >  
> >  /**
> > - * get_pixel_read_function() - Retrieve the correct read_pixel function for a specific
> > + * get_pixel_read_line_function() - Retrieve the correct read_line function for a specific
> >   * format. The returned pointer is NULL for unsupported pixel formats. The caller must ensure that
> >   * the pointer is valid before using it in a vkms_plane_state.
> >   *
> >   * @format: DRM_FORMAT_* value for which to obtain a conversion function (see [drm_fourcc.h])
> >   */
> > -pixel_read_t get_pixel_read_function(u32 format)
> > +pixel_read_line_t get_pixel_read_line_function(u32 format)
> >  {
> >  	switch (format) {
> >  	case DRM_FORMAT_ARGB8888:
> > -		return &ARGB8888_to_argb_u16;
> > +		return &ARGB8888_read_line;
> >  	case DRM_FORMAT_XRGB8888:
> > -		return &XRGB8888_to_argb_u16;
> > +		return &XRGB8888_read_line;
> >  	case DRM_FORMAT_ARGB16161616:
> > -		return &ARGB16161616_to_argb_u16;
> > +		return &ARGB16161616_read_line;
> >  	case DRM_FORMAT_XRGB16161616:
> > -		return &XRGB16161616_to_argb_u16;
> > +		return &XRGB16161616_read_line;
> >  	case DRM_FORMAT_RGB565:
> > -		return &RGB565_to_argb_u16;
> > +		return &RGB565_read_line;
> >  	default:
> >  		/*
> >  		 * This is a bug in vkms_plane_atomic_check(). All the supported
> > diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> > index 3ecea4563254..8d2bef95ff79 100644
> > --- a/drivers/gpu/drm/vkms/vkms_formats.h
> > +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> > @@ -5,7 +5,7 @@
> >  
> >  #include "vkms_drv.h"
> >  
> > -pixel_read_t get_pixel_read_function(u32 format);
> > +pixel_read_line_t get_pixel_read_line_function(u32 format);
> >  
> >  pixel_write_t get_pixel_write_function(u32 format);
> >  
> > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > index 10e9b23dab28..8875bed76410 100644
> > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > @@ -112,7 +112,6 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> >  	frame_info = vkms_plane_state->frame_info;
> >  	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
> >  	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
> > -	memcpy(&frame_info->rotated, &new_state->dst, sizeof(struct drm_rect));
> >  	frame_info->fb = fb;
> >  	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
> >  	drm_framebuffer_get(frame_info->fb);
> > @@ -122,10 +121,8 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> >  									  DRM_MODE_REFLECT_X |
> >  									  DRM_MODE_REFLECT_Y);
> >  
> > -	drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated),
> > -			drm_rect_height(&frame_info->rotated), frame_info->rotation);
> >  
> > -	vkms_plane_state->pixel_read = get_pixel_read_function(fmt);
> > +	vkms_plane_state->pixel_read_line = get_pixel_read_line_function(fmt);
> >  }
> >  
> >  static int vkms_plane_atomic_check(struct drm_plane *plane,
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats.
  2024-11-18 17:17     ` Louis Chauvet
@ 2024-11-18 17:24       ` José Expósito
  2024-11-18 17:35         ` Louis Chauvet
  0 siblings, 1 reply; 20+ messages in thread
From: José Expósito @ 2024-11-18 17:24 UTC (permalink / raw)
  To: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen, rdunlap,
	rodrigosiqueiramelo, seanpaul, simona.vetter, simona,
	thomas.petazzoni, tzimmermann

On Mon, Nov 18, 2024 at 06:17:11PM +0100, Louis Chauvet wrote:
> On 18/11/24 - 18:10, José Expósito wrote:
> > > Introduce the usage of block_h/block_w to compute the offset and the
> > > pointer of a pixel. The previous implementation was specialized for
> > > planes with block_h == block_w == 1. To avoid confusion and allow easier
> > > implementation of tiled formats. It also remove the usage of the
> > > deprecated format field `cpp`.
> > > 
> > > Introduce the plane_index parameter to get an offset/pointer on a
> > > different plane.
> > > 
> > > Acked-by: Maíra Canal <mairacanal@riseup.net>
> > > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > > ---
> > >  drivers/gpu/drm/vkms/vkms_formats.c | 114 ++++++++++++++++++++++++++++--------
> > >  1 file changed, 91 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > index 06aef5162529..7f932d42394d 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > @@ -10,22 +10,46 @@
> > >  #include "vkms_formats.h"
> > >  
> > >  /**
> > > - * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane
> > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > >   *
> > >   * @frame_info: Buffer metadata
> > >   * @x: The x coordinate of the wanted pixel in the buffer
> > >   * @y: The y coordinate of the wanted pixel in the buffer
> > > + * @plane_index: The index of the plane to use
> > > + * @offset: The returned offset inside the buffer of the block
> > 
> > The previous function (pixel_offset) returned a size_t for the offset rather
> > than an int. Do you know if we are safe using an int in this case?
> 
> I think I used int everywhere because it may avoid strange issues with 
> implicit casting and negative number. I don't remember exactly where, but 
> Pekka suggested it.

Ah! Good to know. For the record, I ran locally the IGT tests and
perform some manual testing and I found no issues.

> > > + * @rem_x: The returned X coordinate of the requested pixel in the block
> > > + * @rem_y: The returned Y coordinate of the requested pixel in the block
> > >   *
> > > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > > - * where block_h == block_w == 1.
> > > - * If this requirement is not fulfilled, the resulting offset can point to an other pixel or
> > > - * outside of the buffer.
> > > + * As some pixel formats store multiple pixels in a block (DRM_FORMAT_R* for example), some
> > > + * pixels are not individually addressable. This function return 3 values: the offset of the
> > > + * whole block, and the coordinate of the requested pixel inside this block.
> > > + * For example, if the format is DRM_FORMAT_R1 and the requested coordinate is 13,5, the offset
> > > + * will point to the byte 5*pitches + 13/8 (second byte of the 5th line), and the rem_x/rem_y
> > > + * coordinates will be (13 % 8, 5 % 1) = (5, 0)
> > > + *
> > > + * With this function, the caller just have to extract the correct pixel from the block.
> > >   */
> > > -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > +static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> > > +				 int plane_index, int *offset, int *rem_x, int *rem_y)
> > >  {
> > >  	struct drm_framebuffer *fb = frame_info->fb;
> > > +	const struct drm_format_info *format = frame_info->fb->format;
> > > +	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> > > +	 * in some formats a block can represent multiple pixels.
> > > +	 *
> > > +	 * Dividing x and y by the block size allows to extract the correct offset of the block
> > > +	 * containing the pixel.
> > > +	 */
> > >  
> > > -	return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
> > > +	int block_x = x / drm_format_info_block_width(format, plane_index);
> > > +	int block_y = y / drm_format_info_block_height(format, plane_index);
> > > +	int block_pitch = fb->pitches[plane_index] * drm_format_info_block_height(format,
> > > +										  plane_index);
> > > +	*rem_x = x % drm_format_info_block_width(format, plane_index);
> > > +	*rem_y = y % drm_format_info_block_height(format, plane_index);
> > > +	*offset = fb->offsets[plane_index] +
> > > +		  block_y * block_pitch +
> > > +		  block_x * format->char_per_block[plane_index];
> > >  }
> > >  
> > >  /**
> > > @@ -35,30 +59,71 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> > >   * @frame_info: Buffer metadata
> > >   * @x: The x (width) coordinate inside the plane
> > >   * @y: The y (height) coordinate inside the plane
> > > + * @plane_index: The index of the plane
> > > + * @addr: The returned pointer
> > > + * @rem_x: The returned X coordinate of the requested pixel in the block
> > > + * @rem_y: The returned Y coordinate of the requested pixel in the block
> > >   *
> > > - * Takes the information stored in the frame_info, a pair of coordinates, and
> > > - * returns the address of the first color channel.
> > > - * This function assumes the channels are packed together, i.e. a color channel
> > > - * comes immediately after another in the memory. And therefore, this function
> > > - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> > > + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> > > + * of the block containing this pixel and the pixel position inside this block.
> > >   *
> > > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > > - * where block_h == block_w == 1, otherwise the returned pointer can be outside the buffer.
> > > + * See @packed_pixel_offset for details about rem_x/rem_y behavior.
> > 
> > Missing "s" in the name of the function. Should read "@packed_pixels_offset".
> 
> Thanks!
> 
> > >   */
> > > -static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > > -				int x, int y)
> > > +static void packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > > +			       int x, int y, int plane_index, u8 **addr, int *rem_x,
> > > +			       int *rem_y)
> > >  {
> > > -	size_t offset = pixel_offset(frame_info, x, y);
> > > +	int offset;
> > >  
> > > -	return (u8 *)frame_info->map[0].vaddr + offset;
> > > +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, rem_x, rem_y);
> > > +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> > >  }
> > >  
> > > -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > > +/**
> > > + * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given
> > > + * coordinates
> > > + *
> > > + * @frame_info: Buffer metadata
> > > + * @x: The x (width) coordinate inside the plane
> > > + * @y: The y (height) coordinate inside the plane
> > > + * @plane_index: The index of the plane
> > > + * @addr: The returned pointer
> > > + *
> > > + * This function can only be used with format where block_h == block_w == 1.
> > > + */
> > > +static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
> > > +				   int x, int y, int plane_index, u8 **addr)
> > > +{
> > > +	int offset, rem_x, rem_y;
> > 
> > Nitpick, but it'd be nice if packed_pixels_offset() could take NULLs in
> > the output values so we avoid declaring unused variables here and when
> > calling packed_pixels_addr().
> 
> It is not a trivial change, and as I want this series to be merged I will 
> send the v14 without it. But if I have the time I will send a new 
> patch/series with this cleanup, thanks for the suggestion.

That works for me, we can always fix it in a follow up... Specially since
2 other series depend on this one :)

Jose
 
> > > +
> > > +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format,
> > > +					      plane_index) != 1,
> > > +		"%s() only support formats with block_w == 1", __func__);
> > > +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format,
> > > +					       plane_index) != 1,
> > > +		"%s() only support formats with block_h == 1", __func__);
> > > +
> > > +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, &rem_x,
> > > +			     &rem_y);
> > > +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> > > +}
> > > +
> > > +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
> > > +				 int plane_index)
> > >  {
> > >  	int x_src = frame_info->src.x1 >> 16;
> > >  	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > > +	u8 *addr;
> > > +	int rem_x, rem_y;
> > > +
> > > +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
> > > +		  "%s() only support formats with block_w == 1", __func__);
> > > +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
> > > +		  "%s() only support formats with block_h == 1", __func__);
> > >  
> > > -	return packed_pixels_addr(frame_info, x_src, y_src);
> > > +	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
> > > +
> > > +	return addr;
> > >  }
> > >  
> > >  static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > > @@ -152,14 +217,14 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
> > >  {
> > >  	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> > >  	struct vkms_frame_info *frame_info = plane->frame_info;
> > > -	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
> > >  	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> > >  
> > >  	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> > >  		int x_pos = get_x_position(frame_info, limit, x);
> > >  
> > >  		if (drm_rotation_90_or_270(frame_info->rotation))
> > > -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> > > +			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
> > >  				+ frame_info->fb->format->cpp[0] * y;
> > >  
> > >  		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> > > @@ -250,7 +315,10 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> > >  {
> > >  	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
> > >  	int x_dst = frame_info->dst.x1;
> > > -	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > +	u8 *dst_pixels;
> > > +	int rem_x, rem_y;
> > > +
> > > +	packed_pixels_addr(frame_info, x_dst, y, 0, &dst_pixels, &rem_x, &rem_y);
> > >  	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> > >  	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
> > >  
> > > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats.
  2024-11-18 17:24       ` José Expósito
@ 2024-11-18 17:35         ` Louis Chauvet
  2024-11-19 14:58           ` José Expósito
  0 siblings, 1 reply; 20+ messages in thread
From: Louis Chauvet @ 2024-11-18 17:35 UTC (permalink / raw)
  To: José Expósito
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen, rdunlap,
	rodrigosiqueiramelo, seanpaul, simona.vetter, simona,
	thomas.petazzoni, tzimmermann

On 18/11/24 - 18:24, José Expósito wrote:
> On Mon, Nov 18, 2024 at 06:17:11PM +0100, Louis Chauvet wrote:
> > On 18/11/24 - 18:10, José Expósito wrote:
> > > > Introduce the usage of block_h/block_w to compute the offset and the
> > > > pointer of a pixel. The previous implementation was specialized for
> > > > planes with block_h == block_w == 1. To avoid confusion and allow easier
> > > > implementation of tiled formats. It also remove the usage of the
> > > > deprecated format field `cpp`.
> > > > 
> > > > Introduce the plane_index parameter to get an offset/pointer on a
> > > > different plane.
> > > > 
> > > > Acked-by: Maíra Canal <mairacanal@riseup.net>
> > > > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > > > ---
> > > >  drivers/gpu/drm/vkms/vkms_formats.c | 114 ++++++++++++++++++++++++++++--------
> > > >  1 file changed, 91 insertions(+), 23 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > index 06aef5162529..7f932d42394d 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > @@ -10,22 +10,46 @@
> > > >  #include "vkms_formats.h"
> > > >  
> > > >  /**
> > > > - * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane
> > > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > > >   *
> > > >   * @frame_info: Buffer metadata
> > > >   * @x: The x coordinate of the wanted pixel in the buffer
> > > >   * @y: The y coordinate of the wanted pixel in the buffer
> > > > + * @plane_index: The index of the plane to use
> > > > + * @offset: The returned offset inside the buffer of the block
> > > 
> > > The previous function (pixel_offset) returned a size_t for the offset rather
> > > than an int. Do you know if we are safe using an int in this case?
> > 
> > I think I used int everywhere because it may avoid strange issues with 
> > implicit casting and negative number. I don't remember exactly where, but 
> > Pekka suggested it.
> 
> Ah! Good to know. For the record, I ran locally the IGT tests and
> perform some manual testing and I found no issues.
> 
> > > > + * @rem_x: The returned X coordinate of the requested pixel in the block
> > > > + * @rem_y: The returned Y coordinate of the requested pixel in the block
> > > >   *
> > > > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > > > - * where block_h == block_w == 1.
> > > > - * If this requirement is not fulfilled, the resulting offset can point to an other pixel or
> > > > - * outside of the buffer.
> > > > + * As some pixel formats store multiple pixels in a block (DRM_FORMAT_R* for example), some
> > > > + * pixels are not individually addressable. This function return 3 values: the offset of the
> > > > + * whole block, and the coordinate of the requested pixel inside this block.
> > > > + * For example, if the format is DRM_FORMAT_R1 and the requested coordinate is 13,5, the offset
> > > > + * will point to the byte 5*pitches + 13/8 (second byte of the 5th line), and the rem_x/rem_y
> > > > + * coordinates will be (13 % 8, 5 % 1) = (5, 0)
> > > > + *
> > > > + * With this function, the caller just have to extract the correct pixel from the block.
> > > >   */
> > > > -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > > +static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> > > > +				 int plane_index, int *offset, int *rem_x, int *rem_y)
> > > >  {
> > > >  	struct drm_framebuffer *fb = frame_info->fb;
> > > > +	const struct drm_format_info *format = frame_info->fb->format;
> > > > +	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> > > > +	 * in some formats a block can represent multiple pixels.
> > > > +	 *
> > > > +	 * Dividing x and y by the block size allows to extract the correct offset of the block
> > > > +	 * containing the pixel.
> > > > +	 */
> > > >  
> > > > -	return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
> > > > +	int block_x = x / drm_format_info_block_width(format, plane_index);
> > > > +	int block_y = y / drm_format_info_block_height(format, plane_index);
> > > > +	int block_pitch = fb->pitches[plane_index] * drm_format_info_block_height(format,
> > > > +										  plane_index);
> > > > +	*rem_x = x % drm_format_info_block_width(format, plane_index);
> > > > +	*rem_y = y % drm_format_info_block_height(format, plane_index);
> > > > +	*offset = fb->offsets[plane_index] +
> > > > +		  block_y * block_pitch +
> > > > +		  block_x * format->char_per_block[plane_index];
> > > >  }
> > > >  
> > > >  /**
> > > > @@ -35,30 +59,71 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> > > >   * @frame_info: Buffer metadata
> > > >   * @x: The x (width) coordinate inside the plane
> > > >   * @y: The y (height) coordinate inside the plane
> > > > + * @plane_index: The index of the plane
> > > > + * @addr: The returned pointer
> > > > + * @rem_x: The returned X coordinate of the requested pixel in the block
> > > > + * @rem_y: The returned Y coordinate of the requested pixel in the block
> > > >   *
> > > > - * Takes the information stored in the frame_info, a pair of coordinates, and
> > > > - * returns the address of the first color channel.
> > > > - * This function assumes the channels are packed together, i.e. a color channel
> > > > - * comes immediately after another in the memory. And therefore, this function
> > > > - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> > > > + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> > > > + * of the block containing this pixel and the pixel position inside this block.
> > > >   *
> > > > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > > > - * where block_h == block_w == 1, otherwise the returned pointer can be outside the buffer.
> > > > + * See @packed_pixel_offset for details about rem_x/rem_y behavior.
> > > 
> > > Missing "s" in the name of the function. Should read "@packed_pixels_offset".
> > 
> > Thanks!
> > 
> > > >   */
> > > > -static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > > > -				int x, int y)
> > > > +static void packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > > > +			       int x, int y, int plane_index, u8 **addr, int *rem_x,
> > > > +			       int *rem_y)
> > > >  {
> > > > -	size_t offset = pixel_offset(frame_info, x, y);
> > > > +	int offset;
> > > >  
> > > > -	return (u8 *)frame_info->map[0].vaddr + offset;
> > > > +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, rem_x, rem_y);
> > > > +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> > > >  }
> > > >  
> > > > -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > > > +/**
> > > > + * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given
> > > > + * coordinates
> > > > + *
> > > > + * @frame_info: Buffer metadata
> > > > + * @x: The x (width) coordinate inside the plane
> > > > + * @y: The y (height) coordinate inside the plane
> > > > + * @plane_index: The index of the plane
> > > > + * @addr: The returned pointer
> > > > + *
> > > > + * This function can only be used with format where block_h == block_w == 1.
> > > > + */
> > > > +static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
> > > > +				   int x, int y, int plane_index, u8 **addr)
> > > > +{
> > > > +	int offset, rem_x, rem_y;
> > > 
> > > Nitpick, but it'd be nice if packed_pixels_offset() could take NULLs in
> > > the output values so we avoid declaring unused variables here and when
> > > calling packed_pixels_addr().
> > 
> > It is not a trivial change, and as I want this series to be merged I will 
> > send the v14 without it. But if I have the time I will send a new 
> > patch/series with this cleanup, thanks for the suggestion.
> 
> That works for me, we can always fix it in a follow up... Specially since
> 2 other series depend on this one :)

As the series is completly reviewed/acked, how long do I need to 
wait after the v14 before commiting on drm-misc-next? I plan to send the 
v14 today/tomorrow morning with your changes.

Thanks,
Louis Chauvet
 
> Jose
>  
> > > > +
> > > > +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format,
> > > > +					      plane_index) != 1,
> > > > +		"%s() only support formats with block_w == 1", __func__);
> > > > +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format,
> > > > +					       plane_index) != 1,
> > > > +		"%s() only support formats with block_h == 1", __func__);
> > > > +
> > > > +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, &rem_x,
> > > > +			     &rem_y);
> > > > +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> > > > +}
> > > > +
> > > > +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
> > > > +				 int plane_index)
> > > >  {
> > > >  	int x_src = frame_info->src.x1 >> 16;
> > > >  	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > > > +	u8 *addr;
> > > > +	int rem_x, rem_y;
> > > > +
> > > > +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
> > > > +		  "%s() only support formats with block_w == 1", __func__);
> > > > +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
> > > > +		  "%s() only support formats with block_h == 1", __func__);
> > > >  
> > > > -	return packed_pixels_addr(frame_info, x_src, y_src);
> > > > +	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
> > > > +
> > > > +	return addr;
> > > >  }
> > > >  
> > > >  static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > > > @@ -152,14 +217,14 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
> > > >  {
> > > >  	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> > > >  	struct vkms_frame_info *frame_info = plane->frame_info;
> > > > -	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
> > > >  	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> > > >  
> > > >  	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> > > >  		int x_pos = get_x_position(frame_info, limit, x);
> > > >  
> > > >  		if (drm_rotation_90_or_270(frame_info->rotation))
> > > > -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> > > > +			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
> > > >  				+ frame_info->fb->format->cpp[0] * y;
> > > >  
> > > >  		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> > > > @@ -250,7 +315,10 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> > > >  {
> > > >  	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
> > > >  	int x_dst = frame_info->dst.x1;
> > > > -	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > > +	u8 *dst_pixels;
> > > > +	int rem_x, rem_y;
> > > > +
> > > > +	packed_pixels_addr(frame_info, x_dst, y, 0, &dst_pixels, &rem_x, &rem_y);
> > > >  	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> > > >  	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
> > > >  
> > > > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats.
  2024-11-18 17:35         ` Louis Chauvet
@ 2024-11-19 14:58           ` José Expósito
  0 siblings, 0 replies; 20+ messages in thread
From: José Expósito @ 2024-11-19 14:58 UTC (permalink / raw)
  To: Louis Chauvet
  Cc: airlied, arthurgrillo, corbet, dri-devel, hamohammed.sa,
	helen.koike, jeremie.dautheribes, linux-doc, linux-kernel,
	maarten.lankhorst, mairacanal, marcheu, melissa.srw,
	miquel.raynal, mripard, nicolejadeyee, pekka.paalanen, rdunlap,
	rodrigosiqueiramelo, seanpaul, simona.vetter, simona,
	thomas.petazzoni, tzimmermann

On Mon, Nov 18, 2024 at 06:35:30PM +0100, Louis Chauvet wrote:
> On 18/11/24 - 18:24, José Expósito wrote:
> > On Mon, Nov 18, 2024 at 06:17:11PM +0100, Louis Chauvet wrote:
> > > On 18/11/24 - 18:10, José Expósito wrote:
> > > > > Introduce the usage of block_h/block_w to compute the offset and the
> > > > > pointer of a pixel. The previous implementation was specialized for
> > > > > planes with block_h == block_w == 1. To avoid confusion and allow easier
> > > > > implementation of tiled formats. It also remove the usage of the
> > > > > deprecated format field `cpp`.
> > > > > 
> > > > > Introduce the plane_index parameter to get an offset/pointer on a
> > > > > different plane.
> > > > > 
> > > > > Acked-by: Maíra Canal <mairacanal@riseup.net>
> > > > > Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
> > > > > ---
> > > > >  drivers/gpu/drm/vkms/vkms_formats.c | 114 ++++++++++++++++++++++++++++--------
> > > > >  1 file changed, 91 insertions(+), 23 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > index 06aef5162529..7f932d42394d 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > @@ -10,22 +10,46 @@
> > > > >  #include "vkms_formats.h"
> > > > >  
> > > > >  /**
> > > > > - * pixel_offset() - Get the offset of the pixel at coordinates x/y in the first plane
> > > > > + * packed_pixels_offset() - Get the offset of the block containing the pixel at coordinates x/y
> > > > >   *
> > > > >   * @frame_info: Buffer metadata
> > > > >   * @x: The x coordinate of the wanted pixel in the buffer
> > > > >   * @y: The y coordinate of the wanted pixel in the buffer
> > > > > + * @plane_index: The index of the plane to use
> > > > > + * @offset: The returned offset inside the buffer of the block
> > > > 
> > > > The previous function (pixel_offset) returned a size_t for the offset rather
> > > > than an int. Do you know if we are safe using an int in this case?
> > > 
> > > I think I used int everywhere because it may avoid strange issues with 
> > > implicit casting and negative number. I don't remember exactly where, but 
> > > Pekka suggested it.
> > 
> > Ah! Good to know. For the record, I ran locally the IGT tests and
> > perform some manual testing and I found no issues.
> > 
> > > > > + * @rem_x: The returned X coordinate of the requested pixel in the block
> > > > > + * @rem_y: The returned Y coordinate of the requested pixel in the block
> > > > >   *
> > > > > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > > > > - * where block_h == block_w == 1.
> > > > > - * If this requirement is not fulfilled, the resulting offset can point to an other pixel or
> > > > > - * outside of the buffer.
> > > > > + * As some pixel formats store multiple pixels in a block (DRM_FORMAT_R* for example), some
> > > > > + * pixels are not individually addressable. This function return 3 values: the offset of the
> > > > > + * whole block, and the coordinate of the requested pixel inside this block.
> > > > > + * For example, if the format is DRM_FORMAT_R1 and the requested coordinate is 13,5, the offset
> > > > > + * will point to the byte 5*pitches + 13/8 (second byte of the 5th line), and the rem_x/rem_y
> > > > > + * coordinates will be (13 % 8, 5 % 1) = (5, 0)
> > > > > + *
> > > > > + * With this function, the caller just have to extract the correct pixel from the block.
> > > > >   */
> > > > > -static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > > > +static void packed_pixels_offset(const struct vkms_frame_info *frame_info, int x, int y,
> > > > > +				 int plane_index, int *offset, int *rem_x, int *rem_y)
> > > > >  {
> > > > >  	struct drm_framebuffer *fb = frame_info->fb;
> > > > > +	const struct drm_format_info *format = frame_info->fb->format;
> > > > > +	/* Directly using x and y to multiply pitches and format->ccp is not sufficient because
> > > > > +	 * in some formats a block can represent multiple pixels.
> > > > > +	 *
> > > > > +	 * Dividing x and y by the block size allows to extract the correct offset of the block
> > > > > +	 * containing the pixel.
> > > > > +	 */
> > > > >  
> > > > > -	return fb->offsets[0] + (y * fb->pitches[0]) + (x * fb->format->cpp[0]);
> > > > > +	int block_x = x / drm_format_info_block_width(format, plane_index);
> > > > > +	int block_y = y / drm_format_info_block_height(format, plane_index);
> > > > > +	int block_pitch = fb->pitches[plane_index] * drm_format_info_block_height(format,
> > > > > +										  plane_index);
> > > > > +	*rem_x = x % drm_format_info_block_width(format, plane_index);
> > > > > +	*rem_y = y % drm_format_info_block_height(format, plane_index);
> > > > > +	*offset = fb->offsets[plane_index] +
> > > > > +		  block_y * block_pitch +
> > > > > +		  block_x * format->char_per_block[plane_index];
> > > > >  }
> > > > >  
> > > > >  /**
> > > > > @@ -35,30 +59,71 @@ static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int
> > > > >   * @frame_info: Buffer metadata
> > > > >   * @x: The x (width) coordinate inside the plane
> > > > >   * @y: The y (height) coordinate inside the plane
> > > > > + * @plane_index: The index of the plane
> > > > > + * @addr: The returned pointer
> > > > > + * @rem_x: The returned X coordinate of the requested pixel in the block
> > > > > + * @rem_y: The returned Y coordinate of the requested pixel in the block
> > > > >   *
> > > > > - * Takes the information stored in the frame_info, a pair of coordinates, and
> > > > > - * returns the address of the first color channel.
> > > > > - * This function assumes the channels are packed together, i.e. a color channel
> > > > > - * comes immediately after another in the memory. And therefore, this function
> > > > > - * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> > > > > + * Takes the information stored in the frame_info, a pair of coordinates, and returns the address
> > > > > + * of the block containing this pixel and the pixel position inside this block.
> > > > >   *
> > > > > - * The caller must ensure that the framebuffer associated with this request uses a pixel format
> > > > > - * where block_h == block_w == 1, otherwise the returned pointer can be outside the buffer.
> > > > > + * See @packed_pixel_offset for details about rem_x/rem_y behavior.
> > > > 
> > > > Missing "s" in the name of the function. Should read "@packed_pixels_offset".
> > > 
> > > Thanks!
> > > 
> > > > >   */
> > > > > -static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > > > > -				int x, int y)
> > > > > +static void packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > > > > +			       int x, int y, int plane_index, u8 **addr, int *rem_x,
> > > > > +			       int *rem_y)
> > > > >  {
> > > > > -	size_t offset = pixel_offset(frame_info, x, y);
> > > > > +	int offset;
> > > > >  
> > > > > -	return (u8 *)frame_info->map[0].vaddr + offset;
> > > > > +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, rem_x, rem_y);
> > > > > +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> > > > >  }
> > > > >  
> > > > > -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > > > > +/**
> > > > > + * packed_pixels_addr_1x1() - Get the pointer to the block containing the pixel at the given
> > > > > + * coordinates
> > > > > + *
> > > > > + * @frame_info: Buffer metadata
> > > > > + * @x: The x (width) coordinate inside the plane
> > > > > + * @y: The y (height) coordinate inside the plane
> > > > > + * @plane_index: The index of the plane
> > > > > + * @addr: The returned pointer
> > > > > + *
> > > > > + * This function can only be used with format where block_h == block_w == 1.
> > > > > + */
> > > > > +static void packed_pixels_addr_1x1(const struct vkms_frame_info *frame_info,
> > > > > +				   int x, int y, int plane_index, u8 **addr)
> > > > > +{
> > > > > +	int offset, rem_x, rem_y;
> > > > 
> > > > Nitpick, but it'd be nice if packed_pixels_offset() could take NULLs in
> > > > the output values so we avoid declaring unused variables here and when
> > > > calling packed_pixels_addr().
> > > 
> > > It is not a trivial change, and as I want this series to be merged I will 
> > > send the v14 without it. But if I have the time I will send a new 
> > > patch/series with this cleanup, thanks for the suggestion.
> > 
> > That works for me, we can always fix it in a follow up... Specially since
> > 2 other series depend on this one :)
> 
> As the series is completly reviewed/acked, how long do I need to 
> wait after the v14 before commiting on drm-misc-next? I plan to send the 
> v14 today/tomorrow morning with your changes.

Thanks for sending v14, I checked very quickly the patches and my sugestions
were fixed.

I can't answer this question, but all patches have been reviewed/tested by at
least 2 people, plus you are the only maintainer of the driver now.
I think that, if in a few days there are no complains, this series should be
ready to be merged.

Maybe Maíra, as the previous maintainer, can give you some advice. I have
never maintained a driver, so I'm not sure what are the best practices.

Jose

> Thanks,
> Louis Chauvet
>  
> > Jose
> >  
> > > > > +
> > > > > +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format,
> > > > > +					      plane_index) != 1,
> > > > > +		"%s() only support formats with block_w == 1", __func__);
> > > > > +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format,
> > > > > +					       plane_index) != 1,
> > > > > +		"%s() only support formats with block_h == 1", __func__);
> > > > > +
> > > > > +	packed_pixels_offset(frame_info, x, y, plane_index, &offset, &rem_x,
> > > > > +			     &rem_y);
> > > > > +	*addr = (u8 *)frame_info->map[0].vaddr + offset;
> > > > > +}
> > > > > +
> > > > > +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y,
> > > > > +				 int plane_index)
> > > > >  {
> > > > >  	int x_src = frame_info->src.x1 >> 16;
> > > > >  	int y_src = y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16);
> > > > > +	u8 *addr;
> > > > > +	int rem_x, rem_y;
> > > > > +
> > > > > +	WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index) != 1,
> > > > > +		  "%s() only support formats with block_w == 1", __func__);
> > > > > +	WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_index) != 1,
> > > > > +		  "%s() only support formats with block_h == 1", __func__);
> > > > >  
> > > > > -	return packed_pixels_addr(frame_info, x_src, y_src);
> > > > > +	packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, &rem_y);
> > > > > +
> > > > > +	return addr;
> > > > >  }
> > > > >  
> > > > >  static int get_x_position(const struct vkms_frame_info *frame_info, int limit, int x)
> > > > > @@ -152,14 +217,14 @@ void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_state
> > > > >  {
> > > > >  	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> > > > >  	struct vkms_frame_info *frame_info = plane->frame_info;
> > > > > -	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y, 0);
> > > > >  	int limit = min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffer->n_pixels);
> > > > >  
> > > > >  	for (size_t x = 0; x < limit; x++, src_pixels += frame_info->fb->format->cpp[0]) {
> > > > >  		int x_pos = get_x_position(frame_info, limit, x);
> > > > >  
> > > > >  		if (drm_rotation_90_or_270(frame_info->rotation))
> > > > > -			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1)
> > > > > +			src_pixels = get_packed_src_addr(frame_info, x + frame_info->rotated.y1, 0)
> > > > >  				+ frame_info->fb->format->cpp[0] * y;
> > > > >  
> > > > >  		plane->pixel_read(src_pixels, &out_pixels[x_pos]);
> > > > > @@ -250,7 +315,10 @@ void vkms_writeback_row(struct vkms_writeback_job *wb,
> > > > >  {
> > > > >  	struct vkms_frame_info *frame_info = &wb->wb_frame_info;
> > > > >  	int x_dst = frame_info->dst.x1;
> > > > > -	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > > > +	u8 *dst_pixels;
> > > > > +	int rem_x, rem_y;
> > > > > +
> > > > > +	packed_pixels_addr(frame_info, x_dst, y, 0, &dst_pixels, &rem_x, &rem_y);
> > > > >  	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> > > > >  	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst), src_buffer->n_pixels);
> > > > >  
> > > > > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-11-19 14:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 17:53 [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading Louis Chauvet
2024-10-31 17:53 ` [PATCH v13 1/9] drm/vkms: Code formatting Louis Chauvet
2024-10-31 17:53 ` [PATCH v13 2/9] drm/vkms: Use drm_frame directly Louis Chauvet
2024-10-31 17:53 ` [PATCH v13 3/9] drm/vkms: Add typedef and documentation for pixel_read and pixel_write functions Louis Chauvet
2024-10-31 17:53 ` [PATCH v13 4/9] drm/vkms: Use const for input pointers in pixel_read an " Louis Chauvet
2024-10-31 17:53 ` [PATCH v13 5/9] drm/vkms: Update pixels accessor to support packed and multi-plane formats Louis Chauvet
2024-11-18 17:10   ` José Expósito
2024-11-18 17:17     ` Louis Chauvet
2024-11-18 17:24       ` José Expósito
2024-11-18 17:35         ` Louis Chauvet
2024-11-19 14:58           ` José Expósito
2024-10-31 17:53 ` [PATCH v13 6/9] drm/vkms: Avoid computing blending limits inside pre_mul_alpha_blend Louis Chauvet
2024-10-31 17:53 ` [PATCH v13 7/9] drm/vkms: Introduce pixel_read_direction enum Louis Chauvet
2024-11-18 17:10   ` José Expósito
2024-10-31 17:53 ` [PATCH v13 8/9] drm/vkms: Re-introduce line-per-line composition algorithm Louis Chauvet
2024-11-18 17:10   ` José Expósito
2024-11-18 17:19     ` Louis Chauvet
2024-10-31 17:53 ` [PATCH v13 9/9] drm/vkms: Remove useless drm_rotation_simplify Louis Chauvet
2024-11-18 17:10 ` [PATCH v13 0/9] drm/vkms: Reimplement line-per-line pixel conversion for plane reading José Expósito
2024-11-18 17:13   ` Louis Chauvet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).