Hi Nicolas, On Thu 09 Apr 26, 10:00, Nicolas Dufresne wrote: > Le jeudi 09 avril 2026 à 15:33 +0200, Paul Kocialkowski a écrit : > > Hi, > > > > On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote: > > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > > > stateless slice control and later uses their indices to look up > > > decode->dpb[] in _cedrus_write_ref_list(). > > > > > > Rejecting such controls in cedrus_try_ctrl() would break existing > > > userspace, since stateless H.264 reference lists may legitimately carry > > > out-of-range indices for missing references. Instead, guard the actual > > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > > > V4L2_H264_NUM_DPB_ENTRIES array. > > > > Could you explain why it is legitimate that userspace would pass indices that > > are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1 > > lists are constructed from active references only and the number of items > > there > > should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1. > > We can tolerate invalid data beyond these indices, but certainly not as part > > of the indices that should be valid. > > > > However I agree that cedrus_try_ctrl is maybe not the right place to check it > > since I'm not sure we are guaranteed that the slice params control will be > > checked before the new DPB (from the same request) is applied, so we might end > > up checking against the dpb from the previous decode request. > > > > But I think we should error out and not just skip the invalid reference. > > Its been a long time I haven't looked into this. But what happens here is that > once you lost a reference, the userspace DPB will hold a gap picture, which as > no backing storage. Since it has no backing storage, there is no cookie > (timestamp) associated with it. This gap picture will still make it to the > reference lists, since the position of the reference in the list is important > (you cannot just remove an item). It is an established practice in userspace to > simply fill the void with an invalid index, typically 0xff, which is always > invalid. Because that's what some userspace do, it became part of our ABI. Right we definitely need to keep the order of the L0/L1 lists even with missing references and the question is whether the hardware can deal with it or not. Our uAPI specification currently doesn't say anything about handling missing references. I'm generally not very keen on considering that undefined behavior becomes de-facto uAPI that should never be broken, because there are cases where it is obviously incorrect and the fact that it didn't fail previously is the result of a bug in the implementation. But in this situation I agree we do need a way to indicate that references are missing and using 0xff sounds like a good plan to me, given that we provide a uAPI header define with this value and that the doc mentions it. > Decoders are expected be fault tolerant, though the tolerance level is hardware > specific, and so failing in the common code would be inappropriate (failing in > Cedrus could be acceptable, assuming it can't work with missing references, > which the implementation seems to be fine with). Okay I agree that we should not fail in common code and tolerate a value to indicate a missing reference. What the current proposal is doing (skipping the reference) results in the SRAM entry for the reference remaining untouched, which will keep its value from the previous frame. This seems clearly incorrect. > Hantro G1 notably have a flag to report missing reference to the HW, and it will > manage concealement internally. G2/RKVDEC don't, and we try and pick the most > recent frame as a replacement backing storage, which most of the time minimises > the damages. It sounds like an approach that could work for cedrus too. > As future refinement, we need drivers in the long term to properly report the > damages (perhaps through additional RO request controls). As discussed few years > ago in the error handling wip for rkvdec, the V4L2 doc specify that any sort of > damages known to exist in a frame shall results in the ERROR flag being set. We > can deduce that the error flag with a payload of 0 indicates to userspace to not > use the frame (which typically happen on hard errors, or errror at entropy > decode staged) and ERROR flag with a correct payload signal some level of > corruption, and its left to the application to decide what to do. I think it make sense yes, but it would be good to document it in the uAPI document too. All the best, Paul > Nicolas > > > > > All the best, > > > > Paul > > > > > > > > This keeps the fix local to the driver use site and avoids out-of-bounds > > > reads from malformed or unsupported reference list entries. > > > > > > Signed-off-by: Pengpeng Hou > > > --- > > >  drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++ > > >  1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx > > > *ctx, > > >   u8 dpb_idx; > > >   > > >   dpb_idx = ref_list[i].index; > > > + if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES) > > > + continue; > > > + > > >   dpb = &decode->dpb[dpb_idx]; > > >   > > >   if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) > > > -- > > > 2.50.1 > > > -- Paul Kocialkowski, Independent contractor - sys-base - https://www.sys-base.io/ Free software developer - https://www.paulk.fr/ Expert in multimedia, graphics and embedded hardware support with Linux.