Re: [PATCH v4 2/2] media: cedrus: Add H264 decoding support

From: Maxime Ripard <maxime.ripard@bootlin.com>
To: "Jernej Škrabec" <jernej.skrabec@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	acourbot@chromium.org, jonas@kwiboo.se, jenskuske@gmail.com,
	linux-sunxi@googlegroups.com, linux-kernel@vger.kernel.org,
	tfiga@chromium.org,
	Paul Kocialkowski <paul.kocialkowski@bootlin.com>,
	Chen-Yu Tsai <wens@csie.org>,
	hans.verkuil@cisco.com,
	Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
	sakari.ailus@linux.intel.com, nicolas.dufresne@collabora.com,
	ezequiel@collabora.com, posciak@chromium.org,
	linux-media@vger.kernel.org
Subject: Re: [PATCH v4 2/2] media: cedrus: Add H264 decoding support
Date: Tue, 5 Mar 2019 11:17:32 +0100	[thread overview]
Message-ID: <20190305101732.3eylxubiiboygjc5@flea> (raw)
In-Reply-To: <1717029.ugS2kBEt89@jernej-laptop>

[-- Attachment #1.1: Type: text/plain, Size: 10435 bytes --]

Hi Jernej,

On Wed, Feb 20, 2019 at 06:50:54PM +0100, Jernej Škrabec wrote:
> I really wanted to do another review on previous series but got distracted by 
> analyzing one particulary troublesome H264 sample. It still doesn't work 
> correctly, so I would ask you if you can test it with your stack (it might be 
> userspace issue):
> 
> http://jernej.libreelec.tv/videos/problematic/test.mkv
> 
> Please take a look at my comments below.

I'd really prefer to focus on getting this merged at this point, and
then fixing odd videos and / or setups we can find later
on. Especially when new stacks are going to be developped on top of
this, I'm sure we're going to have plenty of bugs to address :)

> Dne sreda, 20. februar 2019 ob 15:17:34 CET je Maxime Ripard napisal(a):
> > Introduce some basic H264 decoding support in cedrus. So far, only the
> > baseline profile videos have been tested, and some more advanced features
> > used in higher profiles are not even implemented.
> 
> What is not yet implemented? Multi slice frame decoding, interlaced frames and 
> decoding frames with width > 2048. Anything else?

Off the top of my head, nope.

> > +static void cedrus_h264_write_sram(struct cedrus_dev *dev,
> > +				   enum cedrus_h264_sram_off off,
> > +				   const void *data, size_t len)
> > +{
> > +	const u32 *buffer = data;
> > +	size_t count = DIV_ROUND_UP(len, 4);
> > +
> > +	cedrus_write(dev, VE_AVC_SRAM_PORT_OFFSET, off << 2);
> > +
> > +	do {
> > +		cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
> > +	} while (--count);
> 
> Above loop will still write one word for count = 0. I propose following:
> 
> while (count--)
> 	cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);

Good catch, thanks!

> > +	position = find_next_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM,
> > +				      output);
> > +	if (position >= CEDRUS_H264_FRAME_NUM)
> > +		position = find_first_zero_bit(&used_dpbs, 
> CEDRUS_H264_FRAME_NUM);
> 
> I guess you didn't try any interlaced videos? Sometimes it happens that buffer 
> is reference and output at the same time. In such cases, above code would make 
> two entries, which doesn't work based on Kwiboo's and my experiments.
> 
> I guess decoding interlaced videos is out of scope at this time?

Yep, and that should be pretty easy to fix.

> > +
> > +	output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
> > +	output_buf->codec.h264.position = position;
> > +
> > +	if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> > +		output_buf->codec.h264.pic_type = 
> CEDRUS_H264_PIC_TYPE_FIELD;
> > +	else if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> > +		output_buf->codec.h264.pic_type = 
> CEDRUS_H264_PIC_TYPE_MBAFF;
> > +	else
> > +		output_buf->codec.h264.pic_type = 
> CEDRUS_H264_PIC_TYPE_FRAME;
> > +
> > +	cedrus_fill_ref_pic(ctx, output_buf,
> > +			    dec_param->top_field_order_cnt,
> > +			    dec_param->bottom_field_order_cnt,
> > +			    &pic_list[position]);
> > +
> > +	cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_FRAMEBUFFER_LIST,
> > +			       pic_list, sizeof(pic_list));
> > +
> > +	cedrus_write(dev, VE_H264_OUTPUT_FRAME_IDX, position);
> > +}
> > +
> > +#define CEDRUS_MAX_REF_IDX	32
> > +
> > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > +				   struct cedrus_run *run,
> > +				   const u8 *ref_list, u8 num_ref,
> > +				   enum cedrus_h264_sram_off sram)
> > +{
> > +	const struct v4l2_ctrl_h264_decode_param *decode = run-
> >h264.decode_param;
> > +	struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > +	const struct vb2_buffer *dst_buf = &run->dst->vb2_buf;
> > +	struct cedrus_dev *dev = ctx->dev;
> > +	u8 sram_array[CEDRUS_MAX_REF_IDX];
> > +	unsigned int i;
> > +	size_t size;
> > +
> > +	memset(sram_array, 0, sizeof(sram_array));
> > +
> > +	for (i = 0; i < num_ref; i++) {
> > +		const struct v4l2_h264_dpb_entry *dpb;
> > +		const struct cedrus_buffer *cedrus_buf;
> > +		const struct vb2_v4l2_buffer *ref_buf;
> > +		unsigned int position;
> > +		int buf_idx;
> > +		u8 dpb_idx;
> > +
> > +		dpb_idx = ref_list[i];
> > +		dpb = &decode->dpb[dpb_idx];
> > +
> > +		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > +			continue;
> > +
> > +		buf_idx = vb2_find_timestamp(cap_q, dpb->timestamp, 0);
> > +		if (buf_idx < 0)
> > +			continue;
> > +
> > +		ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > +		cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > +		position = cedrus_buf->codec.h264.position;
> > +
> > +		sram_array[i] |= position << 1;
> > +		if (ref_buf->field == V4L2_FIELD_BOTTOM)
> 
> I'm still not convinced that checking buffer field is appropriate solution here. 
> IMO this bit defines top or bottom reference and same buffer could be used for 
> both.
> 
> But I guess this belongs for follow up patch which will fix decoding interlaced 
> videos.

And we can always change the API later on if we find that not adequate

> > +static void cedrus_write_scaling_lists(struct cedrus_ctx *ctx,
> > +				       struct cedrus_run *run)
> > +{
> > +	const struct v4l2_ctrl_h264_scaling_matrix *scaling =
> > +		run->h264.scaling_matrix;
> > +	struct cedrus_dev *dev = ctx->dev;
> > +
> > +	if (!scaling)
> > +		return;
> > +
> > +	cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_0,
> > +			       scaling->scaling_list_8x8[0],
> > +			       sizeof(scaling->scaling_list_8x8[0]));
> > +
> > +	cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_1,
> > +			       scaling->scaling_list_8x8[1],
> > +			       sizeof(scaling->scaling_list_8x8[1]));
> 
> Index above should be 3. IIRC 1 and 3 are used by 4:2:0 chroma subsampling, 
> but currently I'm unable to find reference to that in standard.

Yep, indeed, I'll fix that, thanks!

> > +
> > +	cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_4x4,
> > +			       scaling->scaling_list_4x4,
> > +			       sizeof(scaling->scaling_list_4x4));
> > +}
> > +
> > +static void cedrus_write_pred_weight_table(struct cedrus_ctx *ctx,
> > +					   struct cedrus_run 
> *run)
> > +{
> > +	const struct v4l2_ctrl_h264_slice_param *slice =
> > +		run->h264.slice_param;
> > +	const struct v4l2_h264_pred_weight_table *pred_weight =
> > +		&slice->pred_weight_table;
> > +	struct cedrus_dev *dev = ctx->dev;
> > +	int i, j, k;
> > +
> > +	cedrus_write(dev, VE_H264_SHS_WP,
> > +		     ((pred_weight->chroma_log2_weight_denom & 0xf) << 
> 4) |
> > +		     ((pred_weight->luma_log2_weight_denom & 0xf) << 
> 0));
> 
> Denominators are only in range of 0-7, so mask should be 0x7. CedarX code also 
> specify those two fields 3 bits wide.

Indeed, I'll fix it.

> > +
> > +	cedrus_write(dev, VE_AVC_SRAM_PORT_OFFSET,
> > +		     CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE << 2);
> > +
> > +	for (i = 0; i < ARRAY_SIZE(pred_weight->weight_factors); i++) {
> > +		const struct v4l2_h264_weight_factors *factors =
> > +			&pred_weight->weight_factors[i];
> > +
> > +		for (j = 0; j < ARRAY_SIZE(factors->luma_weight); j++) {
> > +			u32 val;
> > +
> > +			val = ((factors->luma_offset[j] & 0x1ff) << 16) 
> |
> > +				(factors->luma_weight[j] & 0x1ff);
> > +			cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, 
> val);
> 
> You should cast offset varible to wider type. Currently some videos which use 
> prediction weight table don't work for me, unless offset is casted to u32 first. 
> Shifting 8 bit variable for 16 places gives you 0 every time.

I'll do it.

> Luma offset and weight are defined as s8, so having wider mask doesn't really 
> make sense. However, I think weight should be s16 anyway, because standard 
> says that it's value could be 2^denominator for default value or in range 
> -128..127. Worst case would be 2^7 = 128 and -128. To cover both values you 
> need at least 9 bits.

But if I understood the spec right, in that case you would just have
the denominator set, and not the offset, while the offset is used if
you don't use the default formula (and therefore remains in the -128
127 range which is covered by the s8), right?

> > +	reg = 0;
> > +	if (!(scaling && (pps->flags &
> > V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT))) +		reg |=
> > VE_H264_SHS_QP_SCALING_MATRIX_DEFAULT;
> > +	reg |= (pps->second_chroma_qp_index_offset & 0x3f) << 16;
> > +	reg |= (pps->chroma_qp_index_offset & 0x3f) << 8;
> > +	reg |= (pps->pic_init_qp_minus26 + 26 + slice->slice_qp_delta) & 
> 0x3f;
> > +	cedrus_write(dev, VE_H264_SHS_QP, reg);
> > +
> > +	// clear status flags
> > +	cedrus_write(dev, VE_H264_STATUS, cedrus_read(dev, 
> VE_H264_STATUS));
> 
> I'm not sure clearing status here is needed. Do you have any case where it is 
> need? Maybe if some error happened before and cedrus_h264_irq_clear() wasn't 
> cleared. I'm fine either way. 

Yeah, it's just some extra precaution.

> > +
> > +	// enable int
> > +	reg = cedrus_read(dev, VE_H264_CTRL);
> > +	cedrus_write(dev, VE_H264_CTRL, reg |
> > +		     VE_H264_CTRL_SLICE_DECODE_INT |
> > +		     VE_H264_CTRL_DECODE_ERR_INT |
> > +		     VE_H264_CTRL_VLD_DATA_REQ_INT);
> 
> Since this is the only place where you set VE_H264_CTRL, I wouldn't preserve 
> previous content. This mode is also capable of decoding VP8 and AVS. So in 
> theory, if user would want to decode H264 and VP8 videos at the same time, 
> preserving content will probably corrupt your output. I would just set all 
> other bits to 0. What do you think? I tested this without preservation and it 
> works fine.

I'll change it.

> > +	/*
> > +	 * FIXME: This is actually conditional to
> > +	 * V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY not being set, we might
> > +	 * have to rework this if memory efficiency ever is something
> > +	 * we need to work on.
> > +	 */
> > +	field_size = field_size * 2;
> > +	ctx->codec.h264.mv_col_buf_field_size = field_size;
> 
> CedarX code aligns this buffer to 1024. Should we do it too just to be on the 
> safe side? I don't think it cost us anything due to dma_alloc_coherent() 
> alignments.

dma_alloc_coherent will operate on pages, so it doesn't make any
difference there.

> Sorry again for a bit late in-depth review.

Thanks a lot!
Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel