From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC1D3ECF for ; Tue, 22 Oct 2024 07:08:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729580895; cv=none; b=NZdH/2kUvKJc6URVOhQuFafdGEzYjzpI39QdgS9/TmXA0Eg//BrRU3OXo+B20RTg8y5gEApZmAbkUdNS+LL5CTIdnBn71P//oVHfes4+Ge970RaBjrTpg37NVcZDVxVsm5CJM8KsotvVU/GKhRnudHxCPAPVi5VfeHwnzlp/fyI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729580895; c=relaxed/simple; bh=VxNdIty0ceV9Ae/KOEwjm5n6BUvuS4Gr/V7LADDJrGE=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=SBp4S4yBFECfZiQ48UGBrMRJLahgHLNXTKe+5Za8wMsm0UeXjvm0qS+dOiJVoH8Ks3/oSjjYo1auNrHa5xYROeQAIhMnBc9YOrqeUVjPTsguKGpm9lT633wFTDRz2FwPmfb4jJJSuRS2P1NzlZKOHw9NYJPv5yXJ0QiXNGirJoM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=Jp4O8Qat; arc=none smtp.client-ip=209.85.167.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="Jp4O8Qat" Received: by mail-lf1-f41.google.com with SMTP id 2adb3069b0e04-539f76a6f0dso4606890e87.1 for ; Tue, 22 Oct 2024 00:08:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1729580892; x=1730185692; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/GJx+ZT8sn+Ws/5kdx+3Lxj4vzq/2VNioyfeeo193Rg=; b=Jp4O8QatN08daSYxgGbBnqz/NVLO/pf1jVeNtvmPoNvvqw4astYm+1aRBdPM8M24c8 nFknpZRf5rOk0yCEEDKNB4BSQ2Bx6f5kbVtQ2OvOhmqYr74TXT0imI6QnZCrQmMPzsPH QRGiOk4y63uKRmwL7PMQe1bEoqVDrHnhf7/s8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729580892; x=1730185692; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/GJx+ZT8sn+Ws/5kdx+3Lxj4vzq/2VNioyfeeo193Rg=; b=AeL6sw3ZId9rCQakZihp+nms4D9e52J01LHGiUFelDY2cLYsPz3A6b2OlagyTwaTD3 TdMExN+prv4nVZCsk8hqSNEaoWiMoYT3cULdfj+QbDFxLUdp6USvKTcKR5XmnotKrVnN iMNEREIgmk0uLbmq8ENrde5TFUPIe7K2O3SoQBQXkJSq0CupBM1oWwRVtClm/TfDxCxO Dt0t4Zvb5A0ezKX4JpBAzrFbkChcW95cS4kYm+mp6SuA1Dg20iF/CGfddKaLl8kcoRDg iJiQX6dsu03KonOokgJ1LBJtgMCFFrNvY1jlI+5JY0u2jXdLRzN+JFQhhJmVi9tbVY1I gPXA== X-Forwarded-Encrypted: i=1; AJvYcCWnS5ssKC7HYb8YPu3KZanxW8qj++4cRSlEcvMYYA0eNRzLscqpAFHL0Fgj3wKofXevsQfUmwnw9FnK@vger.kernel.org X-Gm-Message-State: AOJu0YwFf2yUIgwKhzLh9XDEZHFFD4Kv8Vp26+P0B0tKWKSGA8/wUXgT yrxBKFYp0fXk+Pz0+CRhRc4aMVyhnqQuYCjt7DE9d3T8lLwYbKSZ0wuI7kO5Le8ikTW6/EhA1o9 2CqwFHqw4IAaiyNcu5M4smNsEAxEfR1+xJMez X-Google-Smtp-Source: AGHT+IFBIR2fOQnzE+888NEXdINjZ+2gmqT4VQ0hlNh2rB/PJBEbJkPTm9TaYAUp+CxfQDKU9wRUD9ZCh+cVl5p1XlE= X-Received: by 2002:a05:6512:3b25:b0:539:e65a:8a85 with SMTP id 2adb3069b0e04-53b13a3a529mr773164e87.55.1729580891929; Tue, 22 Oct 2024 00:08:11 -0700 (PDT) Precedence: bulk X-Mailing-List: devicetree@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20241016034927.8181-1-yunfei.dong@mediatek.com> <20241016034927.8181-3-yunfei.dong@mediatek.com> In-Reply-To: <20241016034927.8181-3-yunfei.dong@mediatek.com> From: Chen-Yu Tsai Date: Tue, 22 Oct 2024 15:08:00 +0800 Message-ID: Subject: Re: [PATCH v2 2/4] media: mediatek: vcodec: support extend h264 video shared information To: Yunfei Dong Cc: =?UTF-8?B?TsOtY29sYXMgRiAuIFIgLiBBIC4gUHJhZG8=?= , Sebastian Fricke , Nicolas Dufresne , Hans Verkuil , AngeloGioacchino Del Regno , Benjamin Gaignard , Nathan Hebert , Daniel Almeida , Hsin-Yi Wang , Fritz Koenig , Daniel Vetter , Steve Cho , linux-media@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, Project_Global_Chrome_Upstream_Group@mediatek.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Oct 16, 2024 at 11:49=E2=80=AFAM Yunfei Dong wrote: > > The address end of working buffer can't be calculated directly > with buffer size in kernel for some special architecture. Adding > new extend vsi to calculate the address end in other os. > Refactoring the driver flow to make the driver looks more reasonable. > > Signed-off-by: Yunfei Dong > --- > .../decoder/vdec/vdec_h264_req_multi_if.c | 355 ++++++++++++------ > 1 file changed, 238 insertions(+), 117 deletions(-) > > diff --git a/drivers/media/platform/mediatek/vcodec/decoder/vdec/vdec_h26= 4_req_multi_if.c b/drivers/media/platform/mediatek/vcodec/decoder/vdec/vdec= _h264_req_multi_if.c > index 57c85af5ffb4..e02ed8dfe0ce 100644 > --- a/drivers/media/platform/mediatek/vcodec/decoder/vdec/vdec_h264_req_m= ulti_if.c > +++ b/drivers/media/platform/mediatek/vcodec/decoder/vdec/vdec_h264_req_m= ulti_if.c > @@ -48,10 +48,29 @@ struct vdec_h264_slice_lat_dec_param { > }; > > /** > - * struct vdec_h264_slice_info - decode information > + * struct vdec_h264_slice_info_ex - extend decode information Please move these structs around so that the extended ones are grouped together. My suggestion is to keep |struct vdec_h264_slice_lat_dec_param|, |struct vdec_h264_slice_info| and |struct vdec_h264_slice_vsi| at the top of the file, and add the new ones below them. That way the ones related are together, and also git won't produce such a hard to read diff. Please also mark every struct that is included in the `vsi` shared memory struct as "shared interface with firmware" so that they do not get changed accidentally. > * > + * @wdma_end_addr_offset: offset from buffer start > * @nal_info: nal info of current picture > * @timeout: Decode timeout: 1 timeout, 0 no timeout > + * @vdec_fb_va: VDEC frame buffer struct virtual address > + * @crc: Used to check whether hardware's status is right > + * @reserved: reserved > + */ > +struct vdec_h264_slice_info_ex { > + u64 wdma_end_addr_offset; > + u16 nal_info; > + u16 timeout; There is a 4 byte hole here. Can you make it explicit by adding an extra reserved field? > + u64 vdec_fb_va; > + u32 crc[8]; > + u32 reserved; > +}; > + > +/** > + * struct vdec_h264_slice_info - decode information > + * > + * @nal_info: nal info of current picture > + * @timeout: Decode timeout: 1 timeout, 0 no timeount > * @bs_buf_size: bitstream size > * @bs_buf_addr: bitstream buffer dma address > * @y_fb_dma: Y frame buffer dma address > @@ -70,6 +89,64 @@ struct vdec_h264_slice_info { > u32 crc[8]; > }; > > +/* > + * struct vdec_h264_slice_mem - memory address and size > + */ > +struct vdec_h264_slice_mem { > + union { > + u64 buf; > + u64 dma_addr; > + }; > + union { > + size_t size; > + u64 dma_addr_end; > + }; > +}; > + > +/** > + * struct vdec_h264_slice_fb - frame buffer for decoding > + * > + * @y: current y buffer address info > + * @c: current c buffer address info > + */ > +struct vdec_h264_slice_fb { > + struct vdec_h264_slice_mem y; > + struct vdec_h264_slice_mem c; > +}; > + > +/** > + * struct vdec_h264_slice_vsi_ex - extend shared memory for decode infor= mation exchange > + * between SCP and Host. > + * > + * @bs: input buffer info > + * @fb: current y/c buffer > + * > + * @ube: ube buffer > + * @trans: transcoded buffer > + * @row_info: row info buffer > + * @err_map: err map buffer > + * @slice_bc: slice buffer > + * > + * @mv_buf_dma: HW working motion vector buffer > + * @dec: decode information (AP-R, VPU-W) > + * @h264_slice_params: decode parameters for hw used > + */ > +struct vdec_h264_slice_vsi_ex { > + /* LAT dec addr */ > + struct vdec_h264_slice_mem bs; > + struct vdec_h264_slice_fb fb; > + > + struct vdec_h264_slice_mem ube; > + struct vdec_h264_slice_mem trans; > + struct vdec_h264_slice_mem row_info; > + struct vdec_h264_slice_mem err_map; > + struct vdec_h264_slice_mem slice_bc; > + > + struct vdec_h264_slice_mem mv_buf_dma[H264_MAX_MV_NUM]; > + struct vdec_h264_slice_info_ex dec; > + struct vdec_h264_slice_lat_dec_param h264_slice_params; > +}; > + > /** > * struct vdec_h264_slice_vsi - shared memory for decode information exc= hange > * between SCP and Host. > @@ -136,10 +213,10 @@ struct vdec_h264_slice_share_info { > * @pred_buf: HW working prediction buffer > * @mv_buf: HW working motion vector buffer > * @vpu: VPU instance > - * @vsi: vsi used for lat > - * @vsi_core: vsi used for core > + * @vsi_ex: extend vsi used for lat > + * @vsi_core_ex: extend vsi used for core > * > - * @vsi_ctx: Local VSI data for this decoding context > + * @vsi_ctx_ex: Local extend vsi data for this decoding context > * @h264_slice_param: the parameters that hardware use to decode > * > * @resolution_changed:resolution changed > @@ -156,10 +233,19 @@ struct vdec_h264_slice_inst { > struct mtk_vcodec_mem pred_buf; > struct mtk_vcodec_mem mv_buf[H264_MAX_MV_NUM]; > struct vdec_vpu_inst vpu; > - struct vdec_h264_slice_vsi *vsi; > - struct vdec_h264_slice_vsi *vsi_core; > - > - struct vdec_h264_slice_vsi vsi_ctx; > + union { > + struct vdec_h264_slice_vsi_ex *vsi_ex; > + struct vdec_h264_slice_vsi *vsi; > + }; > + union { > + struct vdec_h264_slice_vsi_ex *vsi_core_ex; > + struct vdec_h264_slice_vsi *vsi_core; > + }; > + > + union { > + struct vdec_h264_slice_vsi_ex vsi_ctx_ex; > + struct vdec_h264_slice_vsi vsi_ctx; > + }; Code wise I think it would be better to have a union of structs of structs: union { struct { struct vdec_h264_slice_vsi *vsi; struct vdec_h264_slice_vsi *vsi_core; struct vdec_h264_slice_vsi vsi_ctx; }; struct { struct vdec_h264_slice_vsi_ex *vsi_ex; struct vdec_h264_slice_vsi_ex *vsi_core_ex; struct vdec_h264_slice_vsi_ex vsi_ctx_ex; }; }; This makes it clear that the *_ex variants are used together. The code compiles down to the same layout. > struct vdec_h264_slice_lat_dec_param h264_slice_param; > > unsigned int resolution_changed; > @@ -389,6 +475,98 @@ static void vdec_h264_slice_get_crop_info(struct vde= c_h264_slice_inst *inst, > cr->left, cr->top, cr->width, cr->height); > } > > +static void vdec_h264_slice_setup_lat_buffer(struct vdec_h264_slice_inst= *inst, Please add the "_ex" suffix to mark this function as used for the extended architecture. > + struct mtk_vcodec_mem *bs, > + struct vdec_lat_buf *lat_buf= ) > +{ > + struct mtk_vcodec_mem *mem; > + int i; > + > + inst->vsi_ex->bs.dma_addr =3D (u64)bs->dma_addr; > + inst->vsi_ex->bs.size =3D bs->size; > + > + for (i =3D 0; i < H264_MAX_MV_NUM; i++) { > + mem =3D &inst->mv_buf[i]; > + inst->vsi_ex->mv_buf_dma[i].dma_addr =3D mem->dma_addr; > + inst->vsi_ex->mv_buf_dma[i].size =3D mem->size; > + } > + inst->vsi_ex->ube.dma_addr =3D lat_buf->ctx->msg_queue.wdma_addr.= dma_addr; > + inst->vsi_ex->ube.size =3D lat_buf->ctx->msg_queue.wdma_addr.size= ; > + > + inst->vsi_ex->row_info.dma_addr =3D 0; > + inst->vsi_ex->row_info.size =3D 0; > + > + inst->vsi_ex->err_map.dma_addr =3D lat_buf->wdma_err_addr.dma_add= r; > + inst->vsi_ex->err_map.size =3D lat_buf->wdma_err_addr.size; > + > + inst->vsi_ex->slice_bc.dma_addr =3D lat_buf->slice_bc_addr.dma_ad= dr; > + inst->vsi_ex->slice_bc.size =3D lat_buf->slice_bc_addr.size; > + > + inst->vsi_ex->trans.dma_addr_end =3D inst->ctx->msg_queue.wdma_rp= tr_addr; > + inst->vsi_ex->trans.dma_addr =3D inst->ctx->msg_queue.wdma_wptr_a= ddr; > +} > + > +static int vdec_h264_slice_setup_core_buffer(struct vdec_h264_slice_inst= *inst, > + struct vdec_h264_slice_share= _info *share_info, > + struct vdec_lat_buf *lat_buf= ) Same here. > +{ > + struct mtk_vcodec_mem *mem; > + struct mtk_vcodec_dec_ctx *ctx =3D inst->ctx; > + struct vb2_v4l2_buffer *vb2_v4l2; > + struct vdec_fb *fb; > + u64 y_fb_dma, c_fb_dma =3D 0; > + int i; > + > + fb =3D ctx->dev->vdec_pdata->get_cap_buffer(ctx); > + if (!fb) { > + mtk_vdec_err(ctx, "fb buffer is NULL"); > + return -EBUSY; > + } > + > + y_fb_dma =3D (u64)fb->base_y.dma_addr; > + if (ctx->q_data[MTK_Q_DATA_DST].fmt->num_planes =3D=3D 1) > + c_fb_dma =3D > + y_fb_dma + inst->ctx->picinfo.buf_w * inst->ctx->= picinfo.buf_h; > + else > + c_fb_dma =3D (u64)fb->base_c.dma_addr; > + > + mtk_vdec_debug(ctx, "[h264-core] y/c addr =3D 0x%llx 0x%llx", y_f= b_dma, c_fb_dma); > + > + inst->vsi_core_ex->fb.y.dma_addr =3D y_fb_dma; > + inst->vsi_core_ex->fb.y.size =3D ctx->picinfo.fb_sz[0]; > + inst->vsi_core_ex->fb.c.dma_addr =3D c_fb_dma; > + inst->vsi_core_ex->fb.c.size =3D ctx->picinfo.fb_sz[1]; > + > + inst->vsi_core_ex->dec.vdec_fb_va =3D (unsigned long)fb; > + inst->vsi_core_ex->dec.nal_info =3D share_info->nal_info; > + > + inst->vsi_core_ex->ube.dma_addr =3D lat_buf->ctx->msg_queue.wdma_= addr.dma_addr; > + inst->vsi_core_ex->ube.size =3D lat_buf->ctx->msg_queue.wdma_addr= .size; > + > + inst->vsi_core_ex->err_map.dma_addr =3D lat_buf->wdma_err_addr.dm= a_addr; > + inst->vsi_core_ex->err_map.size =3D lat_buf->wdma_err_addr.size; > + > + inst->vsi_core_ex->slice_bc.dma_addr =3D lat_buf->slice_bc_addr.d= ma_addr; > + inst->vsi_core_ex->slice_bc.size =3D lat_buf->slice_bc_addr.size; > + > + inst->vsi_core_ex->row_info.dma_addr =3D 0; > + inst->vsi_core_ex->row_info.size =3D 0; > + > + inst->vsi_core_ex->trans.dma_addr =3D share_info->trans_start; > + inst->vsi_core_ex->trans.dma_addr_end =3D share_info->trans_end; > + > + for (i =3D 0; i < H264_MAX_MV_NUM; i++) { > + mem =3D &inst->mv_buf[i]; > + inst->vsi_core_ex->mv_buf_dma[i].dma_addr =3D mem->dma_ad= dr; > + inst->vsi_core_ex->mv_buf_dma[i].size =3D mem->size; > + } > + > + vb2_v4l2 =3D v4l2_m2m_next_dst_buf(ctx->m2m_ctx); > + v4l2_m2m_buf_copy_metadata(&lat_buf->ts_info, vb2_v4l2, true); > + > + return 0; > +} > + > static int vdec_h264_slice_init(struct mtk_vcodec_dec_ctx *ctx) > { > struct vdec_h264_slice_inst *inst; > @@ -412,10 +590,10 @@ static int vdec_h264_slice_init(struct mtk_vcodec_d= ec_ctx *ctx) > goto error_free_inst; > } > > - vsi_size =3D round_up(sizeof(struct vdec_h264_slice_vsi), VCODEC_= DEC_ALIGNED_64); > - inst->vsi =3D inst->vpu.vsi; > - inst->vsi_core =3D > - (struct vdec_h264_slice_vsi *)(((char *)inst->vpu.vsi) + = vsi_size); > + vsi_size =3D round_up(sizeof(struct vdec_h264_slice_vsi_ex), VCOD= EC_DEC_ALIGNED_64); > + inst->vsi_ex =3D inst->vpu.vsi; > + inst->vsi_core_ex =3D > + (struct vdec_h264_slice_vsi_ex *)(((char *)inst->vpu.vsi)= + vsi_size); Changing this here without any feature checking will break existing platforms because the vsi_core pointer now points to the wrong address. You can't change the existing code path in a non backwards compatible way in one patch then fix it back up in the next patch. It has to be done at the same time. In other words, existing platforms _must_ continue to work throughout your patch series, i.e. with only patches 1 & 2 applied, MT8192 / MT8195 should continue to work properly. Looking at this patch more, it seems that it is better to squash patches 2 & 3 together. The two patches are working on the same thing: adding support for the extended architecture. Having the changes together in one single patch makes more sense. ChenYu