From: Maxime Ripard <maxime.ripard@bootlin.com>
To: "Jernej Škrabec" <jernej.skrabec@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org,
Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
acourbot@chromium.org, jonas@kwiboo.se, jenskuske@gmail.com,
linux-sunxi@googlegroups.com, linux-kernel@vger.kernel.org,
tfiga@chromium.org,
Paul Kocialkowski <paul.kocialkowski@bootlin.com>,
Chen-Yu Tsai <wens@csie.org>,
hans.verkuil@cisco.com,
Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
sakari.ailus@linux.intel.com, nicolas.dufresne@collabora.com,
ezequiel@collabora.com, posciak@chromium.org,
linux-media@vger.kernel.org
Subject: Re: [PATCH v4 2/2] media: cedrus: Add H264 decoding support
Date: Tue, 5 Mar 2019 11:17:32 +0100 [thread overview]
Message-ID: <20190305101732.3eylxubiiboygjc5@flea> (raw)
In-Reply-To: <1717029.ugS2kBEt89@jernej-laptop>
[-- Attachment #1.1: Type: text/plain, Size: 10435 bytes --]
Hi Jernej,
On Wed, Feb 20, 2019 at 06:50:54PM +0100, Jernej Škrabec wrote:
> I really wanted to do another review on previous series but got distracted by
> analyzing one particulary troublesome H264 sample. It still doesn't work
> correctly, so I would ask you if you can test it with your stack (it might be
> userspace issue):
>
> http://jernej.libreelec.tv/videos/problematic/test.mkv
>
> Please take a look at my comments below.
I'd really prefer to focus on getting this merged at this point, and
then fixing odd videos and / or setups we can find later
on. Especially when new stacks are going to be developped on top of
this, I'm sure we're going to have plenty of bugs to address :)
> Dne sreda, 20. februar 2019 ob 15:17:34 CET je Maxime Ripard napisal(a):
> > Introduce some basic H264 decoding support in cedrus. So far, only the
> > baseline profile videos have been tested, and some more advanced features
> > used in higher profiles are not even implemented.
>
> What is not yet implemented? Multi slice frame decoding, interlaced frames and
> decoding frames with width > 2048. Anything else?
Off the top of my head, nope.
> > +static void cedrus_h264_write_sram(struct cedrus_dev *dev,
> > + enum cedrus_h264_sram_off off,
> > + const void *data, size_t len)
> > +{
> > + const u32 *buffer = data;
> > + size_t count = DIV_ROUND_UP(len, 4);
> > +
> > + cedrus_write(dev, VE_AVC_SRAM_PORT_OFFSET, off << 2);
> > +
> > + do {
> > + cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
> > + } while (--count);
>
> Above loop will still write one word for count = 0. I propose following:
>
> while (count--)
> cedrus_write(dev, VE_AVC_SRAM_PORT_DATA, *buffer++);
Good catch, thanks!
> > + position = find_next_zero_bit(&used_dpbs, CEDRUS_H264_FRAME_NUM,
> > + output);
> > + if (position >= CEDRUS_H264_FRAME_NUM)
> > + position = find_first_zero_bit(&used_dpbs,
> CEDRUS_H264_FRAME_NUM);
>
> I guess you didn't try any interlaced videos? Sometimes it happens that buffer
> is reference and output at the same time. In such cases, above code would make
> two entries, which doesn't work based on Kwiboo's and my experiments.
>
> I guess decoding interlaced videos is out of scope at this time?
Yep, and that should be pretty easy to fix.
> > +
> > + output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf);
> > + output_buf->codec.h264.position = position;
> > +
> > + if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)
> > + output_buf->codec.h264.pic_type =
> CEDRUS_H264_PIC_TYPE_FIELD;
> > + else if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> > + output_buf->codec.h264.pic_type =
> CEDRUS_H264_PIC_TYPE_MBAFF;
> > + else
> > + output_buf->codec.h264.pic_type =
> CEDRUS_H264_PIC_TYPE_FRAME;
> > +
> > + cedrus_fill_ref_pic(ctx, output_buf,
> > + dec_param->top_field_order_cnt,
> > + dec_param->bottom_field_order_cnt,
> > + &pic_list[position]);
> > +
> > + cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_FRAMEBUFFER_LIST,
> > + pic_list, sizeof(pic_list));
> > +
> > + cedrus_write(dev, VE_H264_OUTPUT_FRAME_IDX, position);
> > +}
> > +
> > +#define CEDRUS_MAX_REF_IDX 32
> > +
> > +static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
> > + struct cedrus_run *run,
> > + const u8 *ref_list, u8 num_ref,
> > + enum cedrus_h264_sram_off sram)
> > +{
> > + const struct v4l2_ctrl_h264_decode_param *decode = run-
> >h264.decode_param;
> > + struct vb2_queue *cap_q = &ctx->fh.m2m_ctx->cap_q_ctx.q;
> > + const struct vb2_buffer *dst_buf = &run->dst->vb2_buf;
> > + struct cedrus_dev *dev = ctx->dev;
> > + u8 sram_array[CEDRUS_MAX_REF_IDX];
> > + unsigned int i;
> > + size_t size;
> > +
> > + memset(sram_array, 0, sizeof(sram_array));
> > +
> > + for (i = 0; i < num_ref; i++) {
> > + const struct v4l2_h264_dpb_entry *dpb;
> > + const struct cedrus_buffer *cedrus_buf;
> > + const struct vb2_v4l2_buffer *ref_buf;
> > + unsigned int position;
> > + int buf_idx;
> > + u8 dpb_idx;
> > +
> > + dpb_idx = ref_list[i];
> > + dpb = &decode->dpb[dpb_idx];
> > +
> > + if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > + continue;
> > +
> > + buf_idx = vb2_find_timestamp(cap_q, dpb->timestamp, 0);
> > + if (buf_idx < 0)
> > + continue;
> > +
> > + ref_buf = to_vb2_v4l2_buffer(ctx->dst_bufs[buf_idx]);
> > + cedrus_buf = vb2_v4l2_to_cedrus_buffer(ref_buf);
> > + position = cedrus_buf->codec.h264.position;
> > +
> > + sram_array[i] |= position << 1;
> > + if (ref_buf->field == V4L2_FIELD_BOTTOM)
>
> I'm still not convinced that checking buffer field is appropriate solution here.
> IMO this bit defines top or bottom reference and same buffer could be used for
> both.
>
> But I guess this belongs for follow up patch which will fix decoding interlaced
> videos.
And we can always change the API later on if we find that not adequate
> > +static void cedrus_write_scaling_lists(struct cedrus_ctx *ctx,
> > + struct cedrus_run *run)
> > +{
> > + const struct v4l2_ctrl_h264_scaling_matrix *scaling =
> > + run->h264.scaling_matrix;
> > + struct cedrus_dev *dev = ctx->dev;
> > +
> > + if (!scaling)
> > + return;
> > +
> > + cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_0,
> > + scaling->scaling_list_8x8[0],
> > + sizeof(scaling->scaling_list_8x8[0]));
> > +
> > + cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_8x8_1,
> > + scaling->scaling_list_8x8[1],
> > + sizeof(scaling->scaling_list_8x8[1]));
>
> Index above should be 3. IIRC 1 and 3 are used by 4:2:0 chroma subsampling,
> but currently I'm unable to find reference to that in standard.
Yep, indeed, I'll fix that, thanks!
> > +
> > + cedrus_h264_write_sram(dev, CEDRUS_SRAM_H264_SCALING_LIST_4x4,
> > + scaling->scaling_list_4x4,
> > + sizeof(scaling->scaling_list_4x4));
> > +}
> > +
> > +static void cedrus_write_pred_weight_table(struct cedrus_ctx *ctx,
> > + struct cedrus_run
> *run)
> > +{
> > + const struct v4l2_ctrl_h264_slice_param *slice =
> > + run->h264.slice_param;
> > + const struct v4l2_h264_pred_weight_table *pred_weight =
> > + &slice->pred_weight_table;
> > + struct cedrus_dev *dev = ctx->dev;
> > + int i, j, k;
> > +
> > + cedrus_write(dev, VE_H264_SHS_WP,
> > + ((pred_weight->chroma_log2_weight_denom & 0xf) <<
> 4) |
> > + ((pred_weight->luma_log2_weight_denom & 0xf) <<
> 0));
>
> Denominators are only in range of 0-7, so mask should be 0x7. CedarX code also
> specify those two fields 3 bits wide.
Indeed, I'll fix it.
> > +
> > + cedrus_write(dev, VE_AVC_SRAM_PORT_OFFSET,
> > + CEDRUS_SRAM_H264_PRED_WEIGHT_TABLE << 2);
> > +
> > + for (i = 0; i < ARRAY_SIZE(pred_weight->weight_factors); i++) {
> > + const struct v4l2_h264_weight_factors *factors =
> > + &pred_weight->weight_factors[i];
> > +
> > + for (j = 0; j < ARRAY_SIZE(factors->luma_weight); j++) {
> > + u32 val;
> > +
> > + val = ((factors->luma_offset[j] & 0x1ff) << 16)
> |
> > + (factors->luma_weight[j] & 0x1ff);
> > + cedrus_write(dev, VE_AVC_SRAM_PORT_DATA,
> val);
>
> You should cast offset varible to wider type. Currently some videos which use
> prediction weight table don't work for me, unless offset is casted to u32 first.
> Shifting 8 bit variable for 16 places gives you 0 every time.
I'll do it.
> Luma offset and weight are defined as s8, so having wider mask doesn't really
> make sense. However, I think weight should be s16 anyway, because standard
> says that it's value could be 2^denominator for default value or in range
> -128..127. Worst case would be 2^7 = 128 and -128. To cover both values you
> need at least 9 bits.
But if I understood the spec right, in that case you would just have
the denominator set, and not the offset, while the offset is used if
you don't use the default formula (and therefore remains in the -128
127 range which is covered by the s8), right?
> > + reg = 0;
> > + if (!(scaling && (pps->flags &
> > V4L2_H264_PPS_FLAG_PIC_SCALING_MATRIX_PRESENT))) + reg |=
> > VE_H264_SHS_QP_SCALING_MATRIX_DEFAULT;
> > + reg |= (pps->second_chroma_qp_index_offset & 0x3f) << 16;
> > + reg |= (pps->chroma_qp_index_offset & 0x3f) << 8;
> > + reg |= (pps->pic_init_qp_minus26 + 26 + slice->slice_qp_delta) &
> 0x3f;
> > + cedrus_write(dev, VE_H264_SHS_QP, reg);
> > +
> > + // clear status flags
> > + cedrus_write(dev, VE_H264_STATUS, cedrus_read(dev,
> VE_H264_STATUS));
>
> I'm not sure clearing status here is needed. Do you have any case where it is
> need? Maybe if some error happened before and cedrus_h264_irq_clear() wasn't
> cleared. I'm fine either way.
Yeah, it's just some extra precaution.
> > +
> > + // enable int
> > + reg = cedrus_read(dev, VE_H264_CTRL);
> > + cedrus_write(dev, VE_H264_CTRL, reg |
> > + VE_H264_CTRL_SLICE_DECODE_INT |
> > + VE_H264_CTRL_DECODE_ERR_INT |
> > + VE_H264_CTRL_VLD_DATA_REQ_INT);
>
> Since this is the only place where you set VE_H264_CTRL, I wouldn't preserve
> previous content. This mode is also capable of decoding VP8 and AVS. So in
> theory, if user would want to decode H264 and VP8 videos at the same time,
> preserving content will probably corrupt your output. I would just set all
> other bits to 0. What do you think? I tested this without preservation and it
> works fine.
I'll change it.
> > + /*
> > + * FIXME: This is actually conditional to
> > + * V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY not being set, we might
> > + * have to rework this if memory efficiency ever is something
> > + * we need to work on.
> > + */
> > + field_size = field_size * 2;
> > + ctx->codec.h264.mv_col_buf_field_size = field_size;
>
> CedarX code aligns this buffer to 1024. Should we do it too just to be on the
> safe side? I don't think it cost us anything due to dma_alloc_coherent()
> alignments.
dma_alloc_coherent will operate on pages, so it doesn't make any
difference there.
> Sorry again for a bit late in-depth review.
Thanks a lot!
Maxime
--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-03-05 10:17 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-20 14:17 [PATCH v4 0/2] media: cedrus: Add H264 decoding support Maxime Ripard
2019-02-20 14:17 ` [PATCH v4 1/2] media: uapi: Add H264 low-level decoder API compound controls Maxime Ripard
2019-02-22 7:46 ` Tomasz Figa
2019-02-22 16:59 ` Ezequiel Garcia
2019-02-27 10:01 ` Maxime Ripard
2019-02-27 20:57 ` Nicolas Dufresne
2019-03-05 11:16 ` Maxime Ripard
2019-03-05 19:54 ` Ezequiel Garcia
2019-03-04 18:49 ` Ezequiel Garcia
2019-03-05 9:43 ` Maxime Ripard
2019-02-20 14:17 ` [PATCH v4 2/2] media: cedrus: Add H264 decoding support Maxime Ripard
2019-02-20 17:50 ` Jernej Škrabec
2019-02-21 18:21 ` Jernej Škrabec
2019-03-05 10:17 ` Maxime Ripard [this message]
2019-03-05 17:05 ` Jernej Škrabec
2019-03-06 10:57 ` Maxime Ripard
2019-03-06 18:17 ` Jernej Škrabec
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190305101732.3eylxubiiboygjc5@flea \
--to=maxime.ripard@bootlin.com \
--cc=acourbot@chromium.org \
--cc=ezequiel@collabora.com \
--cc=hans.verkuil@cisco.com \
--cc=jenskuske@gmail.com \
--cc=jernej.skrabec@gmail.com \
--cc=jonas@kwiboo.se \
--cc=laurent.pinchart@ideasonboard.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-sunxi@googlegroups.com \
--cc=nicolas.dufresne@collabora.com \
--cc=paul.kocialkowski@bootlin.com \
--cc=posciak@chromium.org \
--cc=sakari.ailus@linux.intel.com \
--cc=tfiga@chromium.org \
--cc=thomas.petazzoni@bootlin.com \
--cc=wens@csie.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox