* [PATCH v2 0/5] better LUT segmentation for EOTFs
@ 2026-05-06 19:11 Melissa Wen
2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
To: airlied, alexander.deucher, christian.koenig, harry.wentland,
simona, siqueira, sunpeng.li
Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
amd-gfx, dri-devel, kernel-dev
Hi,
With an external HDR monitor, we can see gradient banding around the sun
in the intro of Ori and the Will of the Wisps game on steamOS/Gamescope.
Gamescope uses AMD predefined transfer functions for degamma,
shaper/pre-3D-LUT and blend/post-3D-LUT plus CRTC regamma, however, only
degamma block has hardware curves. Shaper, blend, regamma predefined TFs
are software-computed by AMD color module into PWL LUTs. In addition, we
cannot use hardware curves on PRE_DEGAM with subsampled format, so that,
predefined TFs are also translated to LUTs in this situation, using
GAMCOR block instead. For this translation, the driver originally used
the same helper for EOTFs and inverse EOTFs, even though they differ in
input domain, number of regions and number of TF points per region.
Baring this in mind, patch 1 maps degamma predefined curves as LUT using
GAMCOR block for AMD driver-specific property that are still in use by
current gamescope. This was inspired by a similar patch from Harry for
colorop [1]. Patch 2 reverts commit 8b89acc0b2ba ("drm/amd/display:
Remove unused cm3_helper_translate_curve_to_degamma_hw_format") to
reintroduce cm3_helper_translate_curve_to_degamma_hw_format() and patch
3 wire it up for encoded -> linear-light LUTs (degamma/blend). With 16
samples per region across 12 regions for blend LUT (where hardware
fixed-function curves are not available and predefined TFs are
software-computed into LUTs), banding becomes almost imperceptible.
Patch 4 and 5 increase precision in the brightest half, where PQ/SRGB
EOTFs are steeper, by enabling up to 256 samples per region and halving
the per-region point count across 9 regions (128 in [0.5, 1], 64 in
[0.25, 0.5], …). This better matches the shape of PQ/SRGB EOTFs.
Although patches 4 and 5 seem conceptually correct to me, I couldn't see
clear improvement in the bright end with or without them.
This series targets DCN3+ hw families. With this series:
- degamma and blend LUTs use
cm3_helper_translate_curve_to_degamma_hw_format(): encoded input,
non-zero end slope, up to 256 points linearly interpolated between
adjacent TF pts, fitting [0,1] encoded input range.
- shaper and regamma LUTs continue using
cm3_helper_translate_curve_to_hw_format(): linear-light input, zero
end slope, 16 points per region across 32 regions.
[1] https://lore.kernel.org/dri-devel/20260330153451.99472-8-harry.wentland@amd.com/
[v1] https://lore.kernel.org/dri-devel/20260414220237.184289-1-mwen@igalia.com/
Changes:
- new patch for GAMCOR usage in case of degamma predefined TF with subsampled formats
- fix misleading information regarding degamma hw curves (Kruno)
- clarify LUT segmentation choice using 8-bit sRGB as a reference (Kruno)
Best Regards,
Melissa
Melissa Wen (5):
drm/amd/display: use GAMCOR for degamma private props in subsampled
format
Revert "drm/amd/display: Remove unused
cm3_helper_translate_curve_to_degamma_hw_format"
drm/amd/display: use a separate helper to translate degamma curves
drm/amd/display: support up to 256 samples per region in degamma/blend
LUT
drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT
.../amd/display/amdgpu_dm/amdgpu_dm_color.c | 16 +-
.../amd/display/dc/dcn30/dcn30_cm_common.c | 184 ++++++++++++++++++
.../display/dc/dwb/dcn30/dcn30_cm_common.h | 4 +
.../amd/display/dc/hwss/dcn32/dcn32_hwseq.c | 10 +-
4 files changed, 204 insertions(+), 10 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format
2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
2026-05-06 19:11 ` [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format" Melissa Wen
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
To: airlied, alexander.deucher, christian.koenig, harry.wentland,
simona, siqueira, sunpeng.li
Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
amd-gfx, dri-devel, kernel-dev
When setting plane degamma TF via AMD driver-specific color properties,
the driver uses PRE_DEGAM color block (ROM). However, this block cannot
be used with subsampled formats as it affects the linearity of color
space in which HW scaler operates. For subsampled format, use the AMD
color module to map plane degamma predefined curve to LUT and use GAMCOR
block instead (RAM).
This is based on Harry's implementation for Fixed Matrix Colorop.
Link: https://lore.kernel.org/dri-devel/20260330153451.99472-1-harry.wentland@amd.com/
Co-developed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
.../drm/amd/display/amdgpu_dm/amdgpu_dm_color.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
index 20a76d81d532..4e5b664bbec0 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
@@ -1424,7 +1424,7 @@ __set_dm_plane_degamma(struct drm_plane_state *plane_state,
const struct drm_color_lut *degamma_lut;
enum amdgpu_transfer_function tf = AMDGPU_TRANSFER_FUNCTION_DEFAULT;
uint32_t degamma_size;
- bool has_degamma_lut;
+ bool has_degamma_lut, is_subsampled_format;
int ret;
degamma_lut = __extract_blob_lut(dm_plane_state->degamma_lut,
@@ -1454,12 +1454,20 @@ __set_dm_plane_degamma(struct drm_plane_state *plane_state,
if (ret)
return ret;
} else {
- dc_plane_state->in_transfer_func.type =
- TF_TYPE_PREDEFINED;
+ /* Check if format requires post-scale color processing (subsampled formats) */
+ is_subsampled_format = (dc_plane_state->format >= SURFACE_PIXEL_FORMAT_VIDEO_BEGIN &&
+ dc_plane_state->format < SURFACE_PIXEL_FORMAT_SUBSAMPLE_END);
+
+ dc_plane_state->in_transfer_func.type = TF_TYPE_PREDEFINED;
if (!mod_color_calculate_degamma_params(color_caps,
- &dc_plane_state->in_transfer_func, NULL, false))
+ &dc_plane_state->in_transfer_func,
+ NULL,
+ is_subsampled_format)) {
+ drm_err(plane_state->state->dev,
+ "Failed to calculate degamma params.\n");
return -ENOMEM;
+ }
}
return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format"
2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
2026-05-06 19:11 ` [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves Melissa Wen
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
To: airlied, alexander.deucher, christian.koenig, harry.wentland,
simona, siqueira, sunpeng.li
Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
amd-gfx, dri-devel, kernel-dev
This reverts commit 8b89acc0b2baecfe331f5336e7ff1fcc5a44b062.
So that we can detach NL->L LUT programming from L->NL one, i.e., we can
use cm3_helper_translate_curve_to_degamma_hw_format for plane degamma and
blend (post-3DLUT curve) and cm3_helper_translate_curve_to_hw_format for
plane shaper (pre-3DLUT curve) and stream regamma.
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
.../amd/display/dc/dcn30/dcn30_cm_common.c | 151 ++++++++++++++++++
.../display/dc/dwb/dcn30/dcn30_cm_common.h | 4 +
2 files changed, 155 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
index bfd5515c2f4f..0949b1dffc63 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
@@ -303,6 +303,157 @@ bool cm3_helper_translate_curve_to_hw_format(struct dc_context *ctx,
return true;
}
+#define NUM_DEGAMMA_REGIONS 12
+
+
+bool cm3_helper_translate_curve_to_degamma_hw_format(
+ const struct dc_transfer_func *output_tf,
+ struct pwl_params *lut_params)
+{
+ struct curve_points3 *corner_points;
+ struct pwl_result_data *rgb_resulted;
+ struct pwl_result_data *rgb;
+ struct pwl_result_data *rgb_plus_1;
+
+ int32_t region_start, region_end;
+ int32_t i;
+ uint32_t j, k, seg_distr[MAX_REGIONS_NUMBER], increment, start_index, hw_points;
+
+ if (output_tf == NULL || lut_params == NULL || output_tf->type == TF_TYPE_BYPASS)
+ return false;
+
+ corner_points = lut_params->corner_points;
+ rgb_resulted = lut_params->rgb_resulted;
+ hw_points = 0;
+
+ memset(lut_params, 0, sizeof(struct pwl_params));
+ memset(seg_distr, 0, sizeof(seg_distr));
+
+ region_start = -NUM_DEGAMMA_REGIONS;
+ region_end = 0;
+
+
+ for (i = region_end - region_start; i < MAX_REGIONS_NUMBER ; i++)
+ seg_distr[i] = -1;
+ /* 12 segments
+ * segments are from 2^-12 to 0
+ */
+ for (i = 0; i < NUM_DEGAMMA_REGIONS ; i++)
+ seg_distr[i] = 4;
+
+ for (k = 0; k < MAX_REGIONS_NUMBER; k++) {
+ if (seg_distr[k] != -1)
+ hw_points += (1 << seg_distr[k]);
+ }
+
+ j = 0;
+ for (k = 0; k < (region_end - region_start); k++) {
+ increment = NUMBER_SW_SEGMENTS / (1 << seg_distr[k]);
+ start_index = (region_start + k + MAX_LOW_POINT) *
+ NUMBER_SW_SEGMENTS;
+ for (i = start_index; i < start_index + NUMBER_SW_SEGMENTS;
+ i += increment) {
+ if (j == hw_points - 1)
+ break;
+ if (i >= TRANSFER_FUNC_POINTS)
+ return false;
+ rgb_resulted[j].red = output_tf->tf_pts.red[i];
+ rgb_resulted[j].green = output_tf->tf_pts.green[i];
+ rgb_resulted[j].blue = output_tf->tf_pts.blue[i];
+ j++;
+ }
+ }
+
+ /* last point */
+ start_index = (region_end + MAX_LOW_POINT) * NUMBER_SW_SEGMENTS;
+ rgb_resulted[hw_points - 1].red = output_tf->tf_pts.red[start_index];
+ rgb_resulted[hw_points - 1].green = output_tf->tf_pts.green[start_index];
+ rgb_resulted[hw_points - 1].blue = output_tf->tf_pts.blue[start_index];
+
+ corner_points[0].red.x = dc_fixpt_pow(dc_fixpt_from_int(2),
+ dc_fixpt_from_int(region_start));
+ corner_points[0].green.x = corner_points[0].red.x;
+ corner_points[0].blue.x = corner_points[0].red.x;
+ corner_points[1].red.x = dc_fixpt_pow(dc_fixpt_from_int(2),
+ dc_fixpt_from_int(region_end));
+ corner_points[1].green.x = corner_points[1].red.x;
+ corner_points[1].blue.x = corner_points[1].red.x;
+
+ corner_points[0].red.y = rgb_resulted[0].red;
+ corner_points[0].green.y = rgb_resulted[0].green;
+ corner_points[0].blue.y = rgb_resulted[0].blue;
+
+ /* see comment above, m_arrPoints[1].y should be the Y value for the
+ * region end (m_numOfHwPoints), not last HW point(m_numOfHwPoints - 1)
+ */
+ corner_points[1].red.y = rgb_resulted[hw_points - 1].red;
+ corner_points[1].green.y = rgb_resulted[hw_points - 1].green;
+ corner_points[1].blue.y = rgb_resulted[hw_points - 1].blue;
+ corner_points[1].red.slope = dc_fixpt_zero;
+ corner_points[1].green.slope = dc_fixpt_zero;
+ corner_points[1].blue.slope = dc_fixpt_zero;
+
+ if (output_tf->tf == TRANSFER_FUNCTION_PQ) {
+ /* for PQ, we want to have a straight line from last HW X point,
+ * and the slope to be such that we hit 1.0 at 10000 nits.
+ */
+ const struct fixed31_32 end_value =
+ dc_fixpt_from_int(125);
+
+ corner_points[1].red.slope = dc_fixpt_div(
+ dc_fixpt_sub(dc_fixpt_one, corner_points[1].red.y),
+ dc_fixpt_sub(end_value, corner_points[1].red.x));
+ corner_points[1].green.slope = dc_fixpt_div(
+ dc_fixpt_sub(dc_fixpt_one, corner_points[1].green.y),
+ dc_fixpt_sub(end_value, corner_points[1].green.x));
+ corner_points[1].blue.slope = dc_fixpt_div(
+ dc_fixpt_sub(dc_fixpt_one, corner_points[1].blue.y),
+ dc_fixpt_sub(end_value, corner_points[1].blue.x));
+ }
+
+ lut_params->hw_points_num = hw_points;
+
+ k = 0;
+ for (i = 1; i < MAX_REGIONS_NUMBER; i++) {
+ if (seg_distr[k] != -1) {
+ lut_params->arr_curve_points[k].segments_num =
+ seg_distr[k];
+ lut_params->arr_curve_points[i].offset =
+ lut_params->arr_curve_points[k].offset + (1 << seg_distr[k]);
+ }
+ k++;
+ }
+
+ if (seg_distr[k] != -1)
+ lut_params->arr_curve_points[k].segments_num = seg_distr[k];
+
+ rgb = rgb_resulted;
+ rgb_plus_1 = rgb_resulted + 1;
+
+ i = 1;
+ while (i != hw_points + 1) {
+ if (dc_fixpt_lt(rgb_plus_1->red, rgb->red))
+ rgb_plus_1->red = rgb->red;
+ if (dc_fixpt_lt(rgb_plus_1->green, rgb->green))
+ rgb_plus_1->green = rgb->green;
+ if (dc_fixpt_lt(rgb_plus_1->blue, rgb->blue))
+ rgb_plus_1->blue = rgb->blue;
+
+ rgb->delta_red = dc_fixpt_sub(rgb_plus_1->red, rgb->red);
+ rgb->delta_green = dc_fixpt_sub(rgb_plus_1->green, rgb->green);
+ rgb->delta_blue = dc_fixpt_sub(rgb_plus_1->blue, rgb->blue);
+
+ ++rgb_plus_1;
+ ++rgb;
+ ++i;
+ }
+ cm3_helper_convert_to_custom_float(rgb_resulted,
+ lut_params->corner_points,
+ hw_points, false);
+
+ return true;
+}
+
bool cm3_helper_convert_to_custom_float(
struct pwl_result_data *rgb_resulted,
struct curve_points3 *corner_points,
diff --git a/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_cm_common.h b/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_cm_common.h
index 95f9318a54ef..c23dc1bb29bf 100644
--- a/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_cm_common.h
+++ b/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_cm_common.h
@@ -63,6 +63,10 @@ bool cm3_helper_translate_curve_to_hw_format(struct dc_context *ctx,
const struct dc_transfer_func *output_tf,
struct pwl_params *lut_params, bool fixpoint);
+bool cm3_helper_translate_curve_to_degamma_hw_format(
+ const struct dc_transfer_func *output_tf,
+ struct pwl_params *lut_params);
+
bool cm3_helper_convert_to_custom_float(
struct pwl_result_data *rgb_resulted,
struct curve_points3 *corner_points,
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves
2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
2026-05-06 19:11 ` [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format" Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
2026-05-06 19:11 ` [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT Melissa Wen
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
To: airlied, alexander.deucher, christian.koenig, harry.wentland,
simona, siqueira, sunpeng.li
Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
amd-gfx, dri-devel, kernel-dev
In newer DCN families, there is no hw predefined curves for shaper,
blend and regamma. When userspace sets pre-defined curves for these
blocks, the driver uses AMD color module to program predefined curve as
LUT. However, it was using the same LUT segmentation for EOTF and
inverse EOTF by using the same color management helper. This is causing
banding on blend when PQ predefined curve is set. Besides that, degamma
predefined HW curve cannot be used with subsampled 4:2:0/4:2:2 formats
as it affects the linearity of color space in which HW scaler operates.
To mitigate banding when using the blend block and better support
subsampled format on degamma, use different translation helpers when
linearizing and delinearizing.
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
.../gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
index fd42f0afc3a9..9dfbb68d503b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
@@ -493,9 +493,8 @@ bool dcn32_set_mcm_luts(
if (plane_state->blend_tf.type == TF_TYPE_HWPWL)
lut_params = &plane_state->blend_tf.pwl;
else if (plane_state->blend_tf.type == TF_TYPE_DISTRIBUTED_POINTS) {
- result = cm3_helper_translate_curve_to_hw_format(plane_state->ctx,
- &plane_state->blend_tf,
- &dpp_base->regamma_params, false);
+ result = cm3_helper_translate_curve_to_degamma_hw_format(&plane_state->blend_tf,
+ &dpp_base->regamma_params);
if (!result)
return result;
@@ -551,9 +550,8 @@ bool dcn32_set_input_transfer_func(struct dc *dc,
if (plane_state->in_transfer_func.type == TF_TYPE_HWPWL)
params = &plane_state->in_transfer_func.pwl;
else if (plane_state->in_transfer_func.type == TF_TYPE_DISTRIBUTED_POINTS &&
- cm3_helper_translate_curve_to_hw_format(plane_state->ctx,
- &plane_state->in_transfer_func,
- &dpp_base->degamma_params, false))
+ cm3_helper_translate_curve_to_degamma_hw_format(&plane_state->in_transfer_func,
+ &dpp_base->degamma_params))
params = &dpp_base->degamma_params;
dpp_base->funcs->dpp_program_gamcor_lut(dpp_base, params);
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT
2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
` (2 preceding siblings ...)
2026-05-06 19:11 ` [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
2026-05-06 19:11 ` [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT Melissa Wen
2026-05-08 17:45 ` [PATCH v2 0/5] better LUT segmentation for EOTFs Matthew Schwartz
5 siblings, 0 replies; 7+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
To: airlied, alexander.deucher, christian.koenig, harry.wentland,
simona, siqueira, sunpeng.li
Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
amd-gfx, dri-devel, kernel-dev
cm3_helper_translate_curve_to_degamma_hw_format() reads one tf_pts entry
per HW LUT point, limiting the number of samples per region to
NUMBER_SW_SEGMENTS (16, at seg_distr[k] = 4) - higher seg_distr[k]
underflows the increment to 0. But the next patch introduces a halving
distribution for PQ/sRGB EOTFs that requires up to 128 samples in its
upper region (seg_distr[k] = 7).
As preparation, extend the loop index by 4 bits and linearly interpolate
adjacent tf_pts entries with the new interp_tf_pts() helper, where the 4
least significant bits are weight in 1/16 increments. This raises the
cap to 256 samples per region (seg_distr[k] = 8). seg_distr[k] <= 4
paths remain unchanged: the 4 least significant bits remain zero and
interp_tf_pts() reduces to a direct lookup.
Co-developed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
.../amd/display/dc/dcn30/dcn30_cm_common.c | 32 +++++++++++++++----
1 file changed, 25 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
index 0949b1dffc63..70b7bc3494a2 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
@@ -305,6 +305,22 @@ bool cm3_helper_translate_curve_to_hw_format(struct dc_context *ctx,
#define NUM_DEGAMMA_REGIONS 12
+/* Linear interpolation of tf_pts entries, where (i >> 4) is the integer tf_pts
+ * index, (i & 0xf) is the 1/16 sub-position.
+ */
+static struct fixed31_32 interp_tf_pts(const struct fixed31_32 *output_tf_channel, int i)
+{
+ struct fixed31_32 in_plus_one, in, value;
+ uint32_t t = i & 0xf;
+
+ in_plus_one = output_tf_channel[(i >> 4) + 1];
+ in = output_tf_channel[i >> 4];
+ value = dc_fixpt_sub(in_plus_one, in);
+ value = dc_fixpt_shr(dc_fixpt_mul_int(value, t), 4);
+ value = dc_fixpt_add(in, value);
+
+ return value;
+}
bool cm3_helper_translate_curve_to_degamma_hw_format(
const struct dc_transfer_func *output_tf,
@@ -348,18 +364,20 @@ bool cm3_helper_translate_curve_to_degamma_hw_format(
j = 0;
for (k = 0; k < (region_end - region_start); k++) {
- increment = NUMBER_SW_SEGMENTS / (1 << seg_distr[k]);
+ increment = (NUMBER_SW_SEGMENTS << 4) / (1 << seg_distr[k]);
start_index = (region_start + k + MAX_LOW_POINT) *
NUMBER_SW_SEGMENTS;
- for (i = start_index; i < start_index + NUMBER_SW_SEGMENTS;
- i += increment) {
+ for (i = (start_index << 4);
+ i < (start_index << 4) + (NUMBER_SW_SEGMENTS << 4);
+ i += increment) {
if (j == hw_points - 1)
break;
- if (i >= TRANSFER_FUNC_POINTS)
+ if ((i >> 4) + 1 >= TRANSFER_FUNC_POINTS)
return false;
- rgb_resulted[j].red = output_tf->tf_pts.red[i];
- rgb_resulted[j].green = output_tf->tf_pts.green[i];
- rgb_resulted[j].blue = output_tf->tf_pts.blue[i];
+
+ rgb_resulted[j].red = interp_tf_pts(output_tf->tf_pts.red, i);
+ rgb_resulted[j].green = interp_tf_pts(output_tf->tf_pts.green, i);
+ rgb_resulted[j].blue = interp_tf_pts(output_tf->tf_pts.blue, i);
j++;
}
}
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT
2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
` (3 preceding siblings ...)
2026-05-06 19:11 ` [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
2026-05-08 17:45 ` [PATCH v2 0/5] better LUT segmentation for EOTFs Matthew Schwartz
5 siblings, 0 replies; 7+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
To: airlied, alexander.deucher, christian.koenig, harry.wentland,
simona, siqueira, sunpeng.li
Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
amd-gfx, dri-devel, kernel-dev
When linearizing, the input is an encoded signal bounded to [0,1] and
PQ/sRGB EOTFs are steepest near 1, requiring more precision near the
bright end.
Take the 8-bit sRGB case as a reference: 256 possible inputs and 256 HW
LUT points line up, so the LUT acts as plain indexing. Float
representations don't land perfectly, but LERP-ing between two HW
entries, when input is within a small epsilon of one of them, doesn't
materially change the result.
Replace the uniform 12-region distribution (16 points each,
192 total, range [2^-12, 1]) with a 9-region halving distribution for
the PQ/sRGB pre-defined EOTF: 128 points in the top region [0.5, 1], 64
in the next, 32 in the next, and so on, down to 1 point in each of the
two darkest regions. Total samples grow from 192 to 256, with uniform
1/256 spacing across [0, 1]. The dark tail below 2^-9 is no longer
sampled separately, which is acceptable for PQ/sRGB.
Suggested-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
.../amd/display/dc/dcn30/dcn30_cm_common.c | 33 ++++++++++++++-----
1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
index 70b7bc3494a2..66fe7f313ea3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
@@ -303,8 +303,6 @@ bool cm3_helper_translate_curve_to_hw_format(struct dc_context *ctx,
return true;
}
-#define NUM_DEGAMMA_REGIONS 12
-
/* Linear interpolation of tf_pts entries, where (i >> 4) is the integer tf_pts
* index, (i & 0xf) is the 1/16 sub-position.
*/
@@ -345,17 +343,34 @@ bool cm3_helper_translate_curve_to_degamma_hw_format(
memset(lut_params, 0, sizeof(struct pwl_params));
memset(seg_distr, 0, sizeof(seg_distr));
- region_start = -NUM_DEGAMMA_REGIONS;
- region_end = 0;
+ if (output_tf->tf == TRANSFER_FUNCTION_PQ ||
+ output_tf->tf == TRANSFER_FUNCTION_SRGB) {
+ /* 9 segments
+ * segments are from 2^-9 to 0
+ */
+ const uint8_t SEG_COUNT = 9;
+ seg_distr[0] = 0; // Since we only have one point in darkest region
+ for (k = 1; k < SEG_COUNT; k++)
+ seg_distr[k] = k - 1; // 2^(k-1) points per region; halves as k decreases
+ region_start = -SEG_COUNT;
+ region_end = 0;
+ } else {
+ /* 12 segments
+ * segments are from 2^-12 to 2^0
+ * There are less than 256 points, for optimization
+ */
+ const uint8_t SEG_COUNT = 12;
+
+ for (i = 0; i < SEG_COUNT; i++)
+ seg_distr[i] = 4;
+
+ region_start = -SEG_COUNT;
+ region_end = 0;
+ }
for (i = region_end - region_start; i < MAX_REGIONS_NUMBER ; i++)
seg_distr[i] = -1;
- /* 12 segments
- * segments are from 2^-12 to 0
- */
- for (i = 0; i < NUM_DEGAMMA_REGIONS ; i++)
- seg_distr[i] = 4;
for (k = 0; k < MAX_REGIONS_NUMBER; k++) {
if (seg_distr[k] != -1)
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/5] better LUT segmentation for EOTFs
2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
` (4 preceding siblings ...)
2026-05-06 19:11 ` [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT Melissa Wen
@ 2026-05-08 17:45 ` Matthew Schwartz
5 siblings, 0 replies; 7+ messages in thread
From: Matthew Schwartz @ 2026-05-08 17:45 UTC (permalink / raw)
To: Melissa Wen, airlied, alexander.deucher, christian.koenig,
harry.wentland, simona, siqueira, sunpeng.li
Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
Alex Hung, Aurabindo Pillai, pekka.paalanen, amd-gfx, dri-devel,
kernel-dev
On 5/6/26 12:11 PM, Melissa Wen wrote:
> Hi,
>
> With an external HDR monitor, we can see gradient banding around the sun
> in the intro of Ori and the Will of the Wisps game on steamOS/Gamescope.
> Gamescope uses AMD predefined transfer functions for degamma,
> shaper/pre-3D-LUT and blend/post-3D-LUT plus CRTC regamma, however, only
> degamma block has hardware curves. Shaper, blend, regamma predefined TFs
> are software-computed by AMD color module into PWL LUTs. In addition, we
> cannot use hardware curves on PRE_DEGAM with subsampled format, so that,
> predefined TFs are also translated to LUTs in this situation, using
> GAMCOR block instead. For this translation, the driver originally used
> the same helper for EOTFs and inverse EOTFs, even though they differ in
> input domain, number of regions and number of TF points per region.
>
> Baring this in mind, patch 1 maps degamma predefined curves as LUT using
> GAMCOR block for AMD driver-specific property that are still in use by
> current gamescope. This was inspired by a similar patch from Harry for
> colorop [1]. Patch 2 reverts commit 8b89acc0b2ba ("drm/amd/display:
> Remove unused cm3_helper_translate_curve_to_degamma_hw_format") to
> reintroduce cm3_helper_translate_curve_to_degamma_hw_format() and patch
> 3 wire it up for encoded -> linear-light LUTs (degamma/blend). With 16
> samples per region across 12 regions for blend LUT (where hardware
> fixed-function curves are not available and predefined TFs are
> software-computed into LUTs), banding becomes almost imperceptible.
>
> Patch 4 and 5 increase precision in the brightest half, where PQ/SRGB
> EOTFs are steeper, by enabling up to 256 samples per region and halving
> the per-region point count across 9 regions (128 in [0.5, 1], 64 in
> [0.25, 0.5], …). This better matches the shape of PQ/SRGB EOTFs.
> Although patches 4 and 5 seem conceptually correct to me, I couldn't see
> clear improvement in the bright end with or without them.
>
> This series targets DCN3+ hw families. With this series:
> - degamma and blend LUTs use
> cm3_helper_translate_curve_to_degamma_hw_format(): encoded input,
> non-zero end slope, up to 256 points linearly interpolated between
> adjacent TF pts, fitting [0,1] encoded input range.
> - shaper and regamma LUTs continue using
> cm3_helper_translate_curve_to_hw_format(): linear-light input, zero
> end slope, 16 points per region across 32 regions.
>
> [1] https://lore.kernel.org/dri-devel/20260330153451.99472-8-harry.wentland@amd.com/
>
> [v1] https://lore.kernel.org/dri-devel/20260414220237.184289-1-mwen@igalia.com/
> Changes:
> - new patch for GAMCOR usage in case of degamma predefined TF with subsampled formats
> - fix misleading information regarding degamma hw curves (Kruno)
> - clarify LUT segmentation choice using 8-bit sRGB as a reference (Kruno)
>
> Best Regards,
>
> Melissa
I tested this on a DCN35 device with an internal HDR panel that was affected by the gradient issue. I cannot see any banding present with this series applied on top of amd-staging-drm-next with AMD private color properties enabled.
Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev>
for the series.
Thanks,
Matt
>
> Melissa Wen (5):
> drm/amd/display: use GAMCOR for degamma private props in subsampled
> format
> Revert "drm/amd/display: Remove unused
> cm3_helper_translate_curve_to_degamma_hw_format"
> drm/amd/display: use a separate helper to translate degamma curves
> drm/amd/display: support up to 256 samples per region in degamma/blend
> LUT
> drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT
>
> .../amd/display/amdgpu_dm/amdgpu_dm_color.c | 16 +-
> .../amd/display/dc/dcn30/dcn30_cm_common.c | 184 ++++++++++++++++++
> .../display/dc/dwb/dcn30/dcn30_cm_common.h | 4 +
> .../amd/display/dc/hwss/dcn32/dcn32_hwseq.c | 10 +-
> 4 files changed, 204 insertions(+), 10 deletions(-)
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-08 17:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
2026-05-06 19:11 ` [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format" Melissa Wen
2026-05-06 19:11 ` [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves Melissa Wen
2026-05-06 19:11 ` [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT Melissa Wen
2026-05-06 19:11 ` [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT Melissa Wen
2026-05-08 17:45 ` [PATCH v2 0/5] better LUT segmentation for EOTFs Matthew Schwartz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox