public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps()
@ 2026-03-05 15:26 Arnd Bergmann
  2026-03-05 15:26 ` [PATCH 2/2] [v2] media: rkvdec: reduce stack usage in rkvdec_init_v4l2_vp9_count_tbl() Arnd Bergmann
  2026-03-05 16:37 ` [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Nicolas Dufresne
  0 siblings, 2 replies; 5+ messages in thread
From: Arnd Bergmann @ 2026-03-05 15:26 UTC (permalink / raw)
  To: Detlev Casanova, Ezequiel Garcia, Mauro Carvalho Chehab,
	Heiko Stuebner, Nathan Chancellor, Nicolas Dufresne, Hans Verkuil
  Cc: Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Kees Cook, linux-media, linux-rockchip, linux-arm-kernel,
	linux-kernel, llvm

From: Arnd Bergmann <arnd@arndb.de>

The rkvdec_pps had a large set of bitfields, all of which
as misaligned. This causes clang-21 and likely other versions to
produce absolutely awful object code and a warning about very
large stack usage, on targets without unaligned access:

drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c:966:12: error: stack frame size (1472) exceeds limit (1280) in 'rkvdec_vp9_start' [-Werror,-Wframe-larger-than]

Part of the problem here is how all the bitfield accesses are
inlined into a function that already has large structures on
the stack.

Mark set_field_order_cnt() as noinline_for_stack, and split out
the following accesses in assemble_hw_pps() into another noinline
function, both of which now using around 800 bytes of stack in the
same configuration.

There is clearly still something wrong with clang here, but
splitting it into multiple functions reduces the risk of stack
overflow.

Fixes: fde24907570d ("media: rkvdec: Add H264 support for the VDPU383 variant")
Link: https://godbolt.org/z/acP1eKeq9
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
Resending this along with the other patch mainly to point out that
this one is still pending as well.
---
 .../rockchip/rkvdec/rkvdec-vdpu383-h264.c     | 50 ++++++++++---------
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
index 97f1efde2e47..fb4f849d7366 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
@@ -130,7 +130,7 @@ struct rkvdec_h264_ctx {
 	struct vdpu383_regs_h26x regs;
 };
 
-static void set_field_order_cnt(struct rkvdec_pps *pps, const struct v4l2_h264_dpb_entry *dpb)
+static noinline_for_stack void set_field_order_cnt(struct rkvdec_pps *pps, const struct v4l2_h264_dpb_entry *dpb)
 {
 	pps->top_field_order_cnt0 = dpb[0].top_field_order_cnt;
 	pps->bot_field_order_cnt0 = dpb[0].bottom_field_order_cnt;
@@ -166,6 +166,31 @@ static void set_field_order_cnt(struct rkvdec_pps *pps, const struct v4l2_h264_d
 	pps->bot_field_order_cnt15 = dpb[15].bottom_field_order_cnt;
 }
 
+static noinline_for_stack void set_dec_params(struct rkvdec_pps *pps, const struct v4l2_ctrl_h264_decode_params *dec_params)
+{
+	const struct v4l2_h264_dpb_entry *dpb = dec_params->dpb;
+
+	for (int i = 0; i < ARRAY_SIZE(dec_params->dpb); i++) {
+		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
+			pps->is_longterm |= (1 << i);
+		pps->ref_field_flags |=
+		 (!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD)) << i;
+		pps->ref_colmv_use_flag |=
+		 (!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) << i;
+		pps->ref_topfield_used |=
+		 (!!(dpb[i].fields & V4L2_H264_TOP_FIELD_REF)) << i;
+		pps->ref_botfield_used |=
+			(!!(dpb[i].fields & V4L2_H264_BOTTOM_FIELD_REF)) << i;
+	}
+	pps->pic_field_flag =
+		!!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC);
+	pps->pic_associated_flag =
+		!!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD);
+
+	pps->cur_top_field = dec_params->top_field_order_cnt;
+	pps->cur_bot_field = dec_params->bottom_field_order_cnt;
+}
+
 static void assemble_hw_pps(struct rkvdec_ctx *ctx,
 			    struct rkvdec_h264_run *run)
 {
@@ -177,7 +202,6 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
 	struct rkvdec_h264_priv_tbl *priv_tbl = h264_ctx->priv_tbl.cpu;
 	struct rkvdec_sps_pps *hw_ps;
 	u32 pic_width, pic_height;
-	u32 i;
 
 	/*
 	 * HW read the SPS/PPS information from PPS packet index by PPS id.
@@ -261,28 +285,8 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
 		!!(pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT);
 
 	set_field_order_cnt(&hw_ps->pps, dpb);
+	set_dec_params(&hw_ps->pps, dec_params);
 
-	for (i = 0; i < ARRAY_SIZE(dec_params->dpb); i++) {
-		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
-			hw_ps->pps.is_longterm |= (1 << i);
-
-		hw_ps->pps.ref_field_flags |=
-			(!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD)) << i;
-		hw_ps->pps.ref_colmv_use_flag |=
-			(!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) << i;
-		hw_ps->pps.ref_topfield_used |=
-			(!!(dpb[i].fields & V4L2_H264_TOP_FIELD_REF)) << i;
-		hw_ps->pps.ref_botfield_used |=
-			(!!(dpb[i].fields & V4L2_H264_BOTTOM_FIELD_REF)) << i;
-	}
-
-	hw_ps->pps.pic_field_flag =
-		!!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC);
-	hw_ps->pps.pic_associated_flag =
-		!!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD);
-
-	hw_ps->pps.cur_top_field = dec_params->top_field_order_cnt;
-	hw_ps->pps.cur_bot_field = dec_params->bottom_field_order_cnt;
 }
 
 static void rkvdec_write_regs(struct rkvdec_ctx *ctx)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] [v2] media: rkvdec: reduce stack usage in rkvdec_init_v4l2_vp9_count_tbl()
  2026-03-05 15:26 [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Arnd Bergmann
@ 2026-03-05 15:26 ` Arnd Bergmann
  2026-03-05 16:37 ` [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Nicolas Dufresne
  1 sibling, 0 replies; 5+ messages in thread
From: Arnd Bergmann @ 2026-03-05 15:26 UTC (permalink / raw)
  To: Detlev Casanova, Ezequiel Garcia, Mauro Carvalho Chehab,
	Heiko Stuebner, Nathan Chancellor
  Cc: Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Nicolas Dufresne, Hans Verkuil, Alex Bee, Jonas Karlman,
	Kees Cook, linux-media, linux-rockchip, linux-arm-kernel,
	linux-kernel, llvm

From: Arnd Bergmann <arnd@arndb.de>

The deeply nested loop in rkvdec_init_v4l2_vp9_count_tbl() needs a lot
of registers, so when the clang register allocator runs out, it ends up
spilling countless temporaries to the stack:

drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c:966:12: error: stack frame size (1472) exceeds limit (1280) in 'rkvdec_vp9_start' [-Werror,-Wframe-larger-than]

Split out the innermost loop into a separate function that is marked
noinline_for_stack. I tried out all combinations of having some of
the inner loops inside of the separate function, but this was the only
veriant that creates reasonable code with clang-22 on arm64.

Link: https://lore.kernel.org/linux-media/20260202094804.1231706-1-arnd@kernel.org/T/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
--
v2: rework after sering more of the same warning with v1 applied.

My earlier version was much simpler but still exceeded 1280 bytes of
stack space in some configurations for unnecessary variable spills.
---
 .../platform/rockchip/rkvdec/rkvdec-vp9.c     | 48 ++++++++++---------
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
index e4cdd2122873..ecb2819bd566 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
@@ -893,12 +893,36 @@ static void rkvdec_vp9_done(struct rkvdec_ctx *ctx,
 	update_ctx_last_info(vp9_ctx);
 }
 
+/* noinline to ensure clang's register allocator doesn't run out of registers */
+static noinline void
+rkvdec_init_v4l2_vp9_count_tbl_loop(struct rkvdec_vp9_ctx *vp9_ctx, int i, int j, int k, int l)
+{
+	struct rkvdec_vp9_intra_frame_symbol_counts *intra_cnts = vp9_ctx->count_tbl.cpu;
+	struct rkvdec_vp9_inter_frame_symbol_counts *inter_cnts = vp9_ctx->count_tbl.cpu;
+
+	for (int m = 0; m < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0][0][0][0]); ++m) {
+		vp9_ctx->inter_cnts.coeff[i][j][k][l][m] =
+			&inter_cnts->ref_cnt[k][i][j][l][m].coeff;
+		vp9_ctx->inter_cnts.eob[i][j][k][l][m][0] =
+			&inter_cnts->ref_cnt[k][i][j][l][m].eob[0];
+		vp9_ctx->inter_cnts.eob[i][j][k][l][m][1] =
+			&inter_cnts->ref_cnt[k][i][j][l][m].eob[1];
+										\
+		vp9_ctx->intra_cnts.coeff[i][j][k][l][m] =
+			&intra_cnts->ref_cnt[k][i][j][l][m].coeff;
+		vp9_ctx->intra_cnts.eob[i][j][k][l][m][0] =
+			&intra_cnts->ref_cnt[k][i][j][l][m].eob[0];
+		vp9_ctx->intra_cnts.eob[i][j][k][l][m][1] =
+			&intra_cnts->ref_cnt[k][i][j][l][m].eob[1];
+	}
+}
+
 static void rkvdec_init_v4l2_vp9_count_tbl(struct rkvdec_ctx *ctx)
 {
 	struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv;
 	struct rkvdec_vp9_intra_frame_symbol_counts *intra_cnts = vp9_ctx->count_tbl.cpu;
 	struct rkvdec_vp9_inter_frame_symbol_counts *inter_cnts = vp9_ctx->count_tbl.cpu;
-	int i, j, k, l, m;
+	int i, j, k, l;
 
 	vp9_ctx->inter_cnts.partition = &inter_cnts->partition;
 	vp9_ctx->inter_cnts.skip = &inter_cnts->skip;
@@ -936,31 +960,11 @@ static void rkvdec_init_v4l2_vp9_count_tbl(struct rkvdec_ctx *ctx)
 	vp9_ctx->inter_cnts.class0_hp = &inter_cnts->class0_hp;
 	vp9_ctx->inter_cnts.hp = &inter_cnts->hp;
 
-#define INNERMOST_LOOP \
-	do {										\
-		for (m = 0; m < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0][0][0][0]); ++m) {\
-			vp9_ctx->inter_cnts.coeff[i][j][k][l][m] =			\
-				&inter_cnts->ref_cnt[k][i][j][l][m].coeff;		\
-			vp9_ctx->inter_cnts.eob[i][j][k][l][m][0] =			\
-				&inter_cnts->ref_cnt[k][i][j][l][m].eob[0];		\
-			vp9_ctx->inter_cnts.eob[i][j][k][l][m][1] =			\
-				&inter_cnts->ref_cnt[k][i][j][l][m].eob[1];		\
-											\
-			vp9_ctx->intra_cnts.coeff[i][j][k][l][m] =			\
-				&intra_cnts->ref_cnt[k][i][j][l][m].coeff;		\
-			vp9_ctx->intra_cnts.eob[i][j][k][l][m][0] =			\
-				&intra_cnts->ref_cnt[k][i][j][l][m].eob[0];		\
-			vp9_ctx->intra_cnts.eob[i][j][k][l][m][1] =			\
-				&intra_cnts->ref_cnt[k][i][j][l][m].eob[1];		\
-		}									\
-	} while (0)
-
 	for (i = 0; i < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff); ++i)
 		for (j = 0; j < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0]); ++j)
 			for (k = 0; k < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0][0]); ++k)
 				for (l = 0; l < ARRAY_SIZE(vp9_ctx->inter_cnts.coeff[0][0][0]); ++l)
-					INNERMOST_LOOP;
-#undef INNERMOST_LOOP
+					rkvdec_init_v4l2_vp9_count_tbl_loop(vp9_ctx, i, j, k, l);
 }
 
 static int rkvdec_vp9_start(struct rkvdec_ctx *ctx)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps()
  2026-03-05 15:26 [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Arnd Bergmann
  2026-03-05 15:26 ` [PATCH 2/2] [v2] media: rkvdec: reduce stack usage in rkvdec_init_v4l2_vp9_count_tbl() Arnd Bergmann
@ 2026-03-05 16:37 ` Nicolas Dufresne
  2026-03-05 17:10   ` Arnd Bergmann
  1 sibling, 1 reply; 5+ messages in thread
From: Nicolas Dufresne @ 2026-03-05 16:37 UTC (permalink / raw)
  To: Arnd Bergmann, Detlev Casanova, Ezequiel Garcia,
	Mauro Carvalho Chehab, Heiko Stuebner, Nathan Chancellor,
	Hans Verkuil
  Cc: Arnd Bergmann, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Kees Cook, linux-media, linux-rockchip, linux-arm-kernel,
	linux-kernel, llvm

[-- Attachment #1: Type: text/plain, Size: 5823 bytes --]

Hi,

Le jeudi 05 mars 2026 à 16:26 +0100, Arnd Bergmann a écrit :
> From: Arnd Bergmann <arnd@arndb.de>
> 
> The rkvdec_pps had a large set of bitfields, all of which
> as misaligned. This causes clang-21 and likely other versions to
> produce absolutely awful object code and a warning about very
> large stack usage, on targets without unaligned access:

I'm a bit surprised you felt the need for resend. Perhaps you can help us
understand what made you think your patch wasn't being processed ?

My PR:
https://patchwork.linuxtv.org/project/linux-media/patch/2074ba5a5d05e239f432d176eb051105f7e692f9.camel@collabora.com/

And Hans/Mauro did logistic on the #linux-maint IRC channel this morning. I
believe I've marked all the relevant patches on patchwork our of "New" state and
you have my Rb. What else would help you ?

Nicolas

> 
> drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c:966:12: error: stack frame
> size (1472) exceeds limit (1280) in 'rkvdec_vp9_start' [-Werror,-Wframe-
> larger-than]
> 
> Part of the problem here is how all the bitfield accesses are
> inlined into a function that already has large structures on
> the stack.
> 
> Mark set_field_order_cnt() as noinline_for_stack, and split out
> the following accesses in assemble_hw_pps() into another noinline
> function, both of which now using around 800 bytes of stack in the
> same configuration.
> 
> There is clearly still something wrong with clang here, but
> splitting it into multiple functions reduces the risk of stack
> overflow.
> 
> Fixes: fde24907570d ("media: rkvdec: Add H264 support for the VDPU383
> variant")
> Link: https://godbolt.org/z/acP1eKeq9
> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> Resending this along with the other patch mainly to point out that
> this one is still pending as well.
> ---
>  .../rockchip/rkvdec/rkvdec-vdpu383-h264.c     | 50 ++++++++++---------
>  1 file changed, 27 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
> b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
> index 97f1efde2e47..fb4f849d7366 100644
> --- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
> +++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
> @@ -130,7 +130,7 @@ struct rkvdec_h264_ctx {
>  	struct vdpu383_regs_h26x regs;
>  };
>  
> -static void set_field_order_cnt(struct rkvdec_pps *pps, const struct
> v4l2_h264_dpb_entry *dpb)
> +static noinline_for_stack void set_field_order_cnt(struct rkvdec_pps *pps,
> const struct v4l2_h264_dpb_entry *dpb)
>  {
>  	pps->top_field_order_cnt0 = dpb[0].top_field_order_cnt;
>  	pps->bot_field_order_cnt0 = dpb[0].bottom_field_order_cnt;
> @@ -166,6 +166,31 @@ static void set_field_order_cnt(struct rkvdec_pps *pps,
> const struct v4l2_h264_d
>  	pps->bot_field_order_cnt15 = dpb[15].bottom_field_order_cnt;
>  }
>  
> +static noinline_for_stack void set_dec_params(struct rkvdec_pps *pps, const
> struct v4l2_ctrl_h264_decode_params *dec_params)
> +{
> +	const struct v4l2_h264_dpb_entry *dpb = dec_params->dpb;
> +
> +	for (int i = 0; i < ARRAY_SIZE(dec_params->dpb); i++) {
> +		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
> +			pps->is_longterm |= (1 << i);
> +		pps->ref_field_flags |=
> +		 (!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD)) << i;
> +		pps->ref_colmv_use_flag |=
> +		 (!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) << i;
> +		pps->ref_topfield_used |=
> +		 (!!(dpb[i].fields & V4L2_H264_TOP_FIELD_REF)) << i;
> +		pps->ref_botfield_used |=
> +			(!!(dpb[i].fields & V4L2_H264_BOTTOM_FIELD_REF)) <<
> i;
> +	}
> +	pps->pic_field_flag =
> +		!!(dec_params->flags &
> V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC);
> +	pps->pic_associated_flag =
> +		!!(dec_params->flags &
> V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD);
> +
> +	pps->cur_top_field = dec_params->top_field_order_cnt;
> +	pps->cur_bot_field = dec_params->bottom_field_order_cnt;
> +}
> +
>  static void assemble_hw_pps(struct rkvdec_ctx *ctx,
>  			    struct rkvdec_h264_run *run)
>  {
> @@ -177,7 +202,6 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
>  	struct rkvdec_h264_priv_tbl *priv_tbl = h264_ctx->priv_tbl.cpu;
>  	struct rkvdec_sps_pps *hw_ps;
>  	u32 pic_width, pic_height;
> -	u32 i;
>  
>  	/*
>  	 * HW read the SPS/PPS information from PPS packet index by PPS id.
> @@ -261,28 +285,8 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
>  		!!(pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT);
>  
>  	set_field_order_cnt(&hw_ps->pps, dpb);
> +	set_dec_params(&hw_ps->pps, dec_params);
>  
> -	for (i = 0; i < ARRAY_SIZE(dec_params->dpb); i++) {
> -		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
> -			hw_ps->pps.is_longterm |= (1 << i);
> -
> -		hw_ps->pps.ref_field_flags |=
> -			(!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD))
> << i;
> -		hw_ps->pps.ref_colmv_use_flag |=
> -			(!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> << i;
> -		hw_ps->pps.ref_topfield_used |=
> -			(!!(dpb[i].fields & V4L2_H264_TOP_FIELD_REF)) << i;
> -		hw_ps->pps.ref_botfield_used |=
> -			(!!(dpb[i].fields & V4L2_H264_BOTTOM_FIELD_REF)) <<
> i;
> -	}
> -
> -	hw_ps->pps.pic_field_flag =
> -		!!(dec_params->flags &
> V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC);
> -	hw_ps->pps.pic_associated_flag =
> -		!!(dec_params->flags &
> V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD);
> -
> -	hw_ps->pps.cur_top_field = dec_params->top_field_order_cnt;
> -	hw_ps->pps.cur_bot_field = dec_params->bottom_field_order_cnt;
>  }
>  
>  static void rkvdec_write_regs(struct rkvdec_ctx *ctx)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps()
  2026-03-05 16:37 ` [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Nicolas Dufresne
@ 2026-03-05 17:10   ` Arnd Bergmann
  2026-03-05 18:43     ` Nicolas Dufresne
  0 siblings, 1 reply; 5+ messages in thread
From: Arnd Bergmann @ 2026-03-05 17:10 UTC (permalink / raw)
  To: Nicolas Dufresne, Arnd Bergmann, Detlev Casanova, Ezequiel Garcia,
	Mauro Carvalho Chehab, Heiko Stübner, Nathan Chancellor,
	Hans Verkuil
  Cc: Nick Desaulniers, Bill Wendling, Justin Stitt, Kees Cook,
	linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, llvm

On Thu, Mar 5, 2026, at 17:37, Nicolas Dufresne wrote:
>
> Le jeudi 05 mars 2026 à 16:26 +0100, Arnd Bergmann a écrit :
>> From: Arnd Bergmann <arnd@arndb.de>
>> 
>> The rkvdec_pps had a large set of bitfields, all of which
>> as misaligned. This causes clang-21 and likely other versions to
>> produce absolutely awful object code and a warning about very
>> large stack usage, on targets without unaligned access:
>
> I'm a bit surprised you felt the need for resend. Perhaps you can help us
> understand what made you think your patch wasn't being processed ?

I updated the second patch today after I found a corner case that
wasn't addressed by the first version. As I had sent both as a series
a month ago, and neither was in linux-next yet, it seemed more helpful
to send an updated series rather than replace only one of the two.

> My PR:
> https://patchwork.linuxtv.org/project/linux-media/patch/2074ba5a5d05e239f432d176eb051105f7e692f9.camel@collabora.com/
>
> And Hans/Mauro did logistic on the #linux-maint IRC channel this morning. I
> believe I've marked all the relevant patches on patchwork our of "New" state and
> you have my Rb. What else would help you ?

That's fine then, I did not mean to seem impatient. I assume
the original patches will be in linux-next then, and [v2 2/2]
will conflict. I'll let you review that one first, but can
send a rebased version if you think we should merge it on top.

      Arnd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps()
  2026-03-05 17:10   ` Arnd Bergmann
@ 2026-03-05 18:43     ` Nicolas Dufresne
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolas Dufresne @ 2026-03-05 18:43 UTC (permalink / raw)
  To: Arnd Bergmann, Arnd Bergmann, Detlev Casanova, Ezequiel Garcia,
	Mauro Carvalho Chehab, Heiko Stübner, Nathan Chancellor,
	Hans Verkuil
  Cc: Nick Desaulniers, Bill Wendling, Justin Stitt, Kees Cook,
	linux-media, linux-rockchip, linux-arm-kernel, linux-kernel, llvm

[-- Attachment #1: Type: text/plain, Size: 2657 bytes --]

Hi,

Le jeudi 05 mars 2026 à 18:10 +0100, Arnd Bergmann a écrit :
> On Thu, Mar 5, 2026, at 17:37, Nicolas Dufresne wrote:
> > 
> > Le jeudi 05 mars 2026 à 16:26 +0100, Arnd Bergmann a écrit :
> > > From: Arnd Bergmann <arnd@arndb.de>
> > > 
> > > The rkvdec_pps had a large set of bitfields, all of which
> > > as misaligned. This causes clang-21 and likely other versions to
> > > produce absolutely awful object code and a warning about very
> > > large stack usage, on targets without unaligned access:
> > 
> > I'm a bit surprised you felt the need for resend. Perhaps you can help us
> > understand what made you think your patch wasn't being processed ?
> 
> I updated the second patch today after I found a corner case that
> wasn't addressed by the first version. As I had sent both as a series
> a month ago, and neither was in linux-next yet, it seemed more helpful
> to send an updated series rather than replace only one of the two.

If you updated the code I'd prefer if it is sent as a new version, not a resend.
As for the media merge window, its been about a week since rc1 got merged into
media tree, with couple of weeks before that waiting for rc1 to land. I believe
your month gap is there and accurate. I did also took a small break on review
and PR concurrently.

> 
> > My PR:
> > https://patchwork.linuxtv.org/project/linux-media/patch/2074ba5a5d05e239f432d176eb051105f7e692f9.camel@collabora.com/
> > 
> > And Hans/Mauro did logistic on the #linux-maint IRC channel this morning. I
> > believe I've marked all the relevant patches on patchwork our of "New" state
> > and
> > you have my Rb. What else would help you ?
> 
> That's fine then, I did not mean to seem impatient. I assume
> the original patches will be in linux-next then, and [v2 2/2]
> will conflict. I'll let you review that one first, but can
> send a rebased version if you think we should merge it on top.

It will first reach media media-fixes (and not media-next, to avoid duplicating
the patches), but I have no idea if someone picks from media-fixes into linux-
next. I recall there was a gap to be fixed in this "pre-integration" process.
Though, patch from fixes reaches RCs quickly, which are linux-next base.

Now, concurrently, Detlev is working on removing the bitfield for the RPS, and
the SPS will come later. Perhaps you want to sync to make sure we don't just
cancel out the work.

To solve the patch conflict issue, you can work on top of your existing series
(just put a comment in the cover). I'll request a merge of rc2 / rc3 into media-
next, before I pick it up.

regards,
Nicolas

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-05 18:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-05 15:26 [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Arnd Bergmann
2026-03-05 15:26 ` [PATCH 2/2] [v2] media: rkvdec: reduce stack usage in rkvdec_init_v4l2_vp9_count_tbl() Arnd Bergmann
2026-03-05 16:37 ` [PATCH 1/2] [RESEND] media: rkvdec: reduce excessive stack usage in assemble_hw_pps() Nicolas Dufresne
2026-03-05 17:10   ` Arnd Bergmann
2026-03-05 18:43     ` Nicolas Dufresne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox