From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from phobos.denx.de (phobos.denx.de [85.214.62.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2138E1AAE39; Thu, 3 Oct 2024 18:28:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=85.214.62.61 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727980091; cv=none; b=X7lXxW6mzbalO7WrIJk0Ol29U/e3nx7CoyJoIXQ2h7fKVxdcom33f81uVJJD/6K6kE1VxqAfWDDjk6Yl/CfusuP9TiV9pTLVgfNYrfCTLJQHF9rluRg7AvGgcArko98RQC83q/SGXEGh3iM+KEMbLU9XrnnhSuDhXjrcmWojKhQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727980091; c=relaxed/simple; bh=kMI3ny3xTTry7krKwKpYFO5Doupq64RFSit0H86k+ts=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=HaLppllg7vyl2vpOMLI1gqv0c4iYs7UPKUjemDc3p0SRNE0QpvcrorEAWelsvGRC4zh7Jc6ieHNmKQKqgPy2tjWbYjx0e2IRoit6js2sUC2Yt3FOX9zR37beQ5l+f3nGWVE/K+g0Hmp0M1tk8NWX54sebxTwQ3yDUlCgQTs8164= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=denx.de; spf=pass smtp.mailfrom=denx.de; dkim=pass (2048-bit key) header.d=denx.de header.i=@denx.de header.b=ioVYB19M; arc=none smtp.client-ip=85.214.62.61 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=denx.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=denx.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=denx.de header.i=@denx.de header.b="ioVYB19M" Received: from [127.0.0.1] (p578adb1c.dip0.t-ipconnect.de [87.138.219.28]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: marex@denx.de) by phobos.denx.de (Postfix) with ESMTPSA id 4F54688E65; Thu, 3 Oct 2024 20:28:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=denx.de; s=phobos-20191101; t=1727980087; bh=afyXWxj3lWWLB446UwVB7q7EVGBBZgtq+4WnIjUxE9s=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=ioVYB19M3ux3BUexbm9xZ6oIpnLfgHad/2M5t8r/ItjfaQOABimZxNXQM7W89Exn8 weu0LigmMSFgnCG5X0aGb4WJF5p5cAIRIrxgvVvWgaSXzHdYcXOysWz6oPtJ6bu58J njvTjP4/76Bk4CRIWgJOOMkTQS4DxPrj4W0+2dnY9Apf3hFreDb7triwdjhaqq/zC1 xunP94CvcbSbKOZd9k/cOG83uFys3X9QW5kHxy0ZRIfxGHDFLOsRPzAxMRY1xGy1QV 5Xn0brSxLbQqAUuICfcTvAoSK1Pcm0RZrLZ4qzI3j7pUXhjUdltHM2Nz1HCZCruryL WZJ89YcEpHxoA== Message-ID: <9b7f8d96-e86b-4eb0-855a-8c74fc96d3c8@denx.de> Date: Thu, 3 Oct 2024 17:11:44 +0200 Precedence: bulk X-Mailing-List: imx@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/2] media: imx: vdic: Introduce mem2mem VDI deinterlacer driver To: Philipp Zabel , linux-media@vger.kernel.org Cc: Daniel Vetter , David Airlie , Fabio Estevam , Greg Kroah-Hartman , Helge Deller , Mauro Carvalho Chehab , Pengutronix Kernel Team , Sascha Hauer , Shawn Guo , Steve Longerbeam , dri-devel@lists.freedesktop.org, imx@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-fbdev@vger.kernel.org, linux-staging@lists.linux.dev References: <20240724002044.112544-1-marex@denx.de> <20240724002044.112544-2-marex@denx.de> <3e850259-9349-4215-947a-ce192fa95f14@denx.de> Content-Language: en-US From: Marek Vasut In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.103.8 at phobos.denx.de X-Virus-Status: Clean On 9/26/24 1:14 PM, Philipp Zabel wrote: > Hi, Hi, > On Mi, 2024-09-25 at 22:14 +0200, Marek Vasut wrote: >> The userspace could distribute the frames between the two devices in an >> alternating manner, can it not ? > > This doesn't help with latency, or when converting a single large > frame. > > For the deinterlacer, this can't be done with the motion-aware > temporally filtering modes. Those need a field from the previous frame. It is up to the userspace to pass the correct frames to the deinterlacer. >> Would the 1280x360 field be split into two tiles vertically and each >> tile (newly 1280/2 x 360) be enqueued on each VDIC ? I don't think that >> works, because you wouldn't be able to stitch those tiles back together >> nicely after the deinterlacing, would you? I would expect to see some >> sort of an artifact exactly where the two tiles got stitched back >> together, because the VDICs are unaware of each other and how each >> deinterlaced the tile. > > I was thinking horizontally, two 640x720 tiles side by side. 1280 is > larger than the 968 pixel maximum horizontal resolution of the VDIC. > > As you say, splitting vertically (which would be required for 1080i) > should cause artifacts at the seam due to the 4-tap vertical filter. Can the userspace set some sort of offset/stride in each buffer and distribute the task between the two VDIs then ? > [...] >>> >>> With the rigid V4L2 model though, where memory handling, parameter >>> calculation, and job scheduling of tiles in a single frame all have to >>> be hidden behind the V4L2 API, I don't think requiring userspace to >>> combine multiple mem2mem video devices to work together on a single >>> frame is feasible. >> >> If your concern is throughput (from what I gathered from the text >> above), userspace could schedule frames on either VDIC in alternating >> manner. > > Both throughput and latency. > > Yes, alternating to different devices would help with throughput where > possible, but it's worse for frame pacing, a hassle to implement > generically in userspace, and it's straight up impossible with temporal > filtering. See above, userspace should be able to pass the correct frames to m2m device. >> I think this is much better and actually generic approach than trying to >> combine two independent devices on kernel level and introduce some sort >> of scheduler into kernel driver to distribute jobs between the two >> devices. Generic, because this approach works even if either of the two >> devices is not VDIC. Independent devices, because yes, the MX6Q IPUs are >> two independent blocks, it is only the current design of the IPUv3 >> driver that makes them look kind-of like they are one single big device, >> I am not happy about that design, but rewriting the IPUv3 driver is way >> out of scope here. (*) > > The IPUs are glued together at the capture and output paths, so yes, > they are independent blocks, but also work together as a big device. > >>> Is limiting different users to the different deinterlacer hardware >>> units a real usecase? I saw the two ICs, when used as mem2mem devices, >>> as interchangeable resources. >> >> I do not have that use case, but I can imagine it could come up. >> In my case, I schedule different cameras to different VDICs from >> userspace as needed. > > Is this just because a single VDIC does not have enough throughput to > serve all cameras, or is there some reason for a fixed assignment > between cameras and VDICs? I want to be able to distribute the bandwidth utilization between the two IPUs . >>>>> To be fair, we never implemented that for the CSC/scaler mem2mem device >>>>> either. >>>> >>>> I don't think that is actually a good idea. Instead, it would be better >>>> to have two scaler nodes in userspace. >>> >>> See above, that would make it impossible (or rather unreasonably >>> complicated) to distribute work on a single frame to both IPUs. >> >> Is your concern latency instead of throughput ? See my comment in >> paragraph (*) . > > Either, depending on the use case. > > [...] >>>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207 >>> >>> That code is unused. The direct hardware path doesn't use >>> IPUV3_CHANNEL_MEM_VDI_PREV/CUR/NEXT, but is has a similar effect, half >>> of the incoming fields are dropped. The setup is vdic_setup_direct(). >> >> All right, let's drop that unused code then, I'll prepare a patch. > > Thanks! > >> But it seems the bottom line is, the VDI direct mode does not act as a >> frame-rate doubler ? > > Yes, it can't. In direct mode, VDIC only receives half of the fields. > > [...] >>>> >> Why would adding the (configurable) frame-rate doubling mode break >> userspace if this is not the default ? > > I'm not sure it would. Maybe there should be a deinterlacer control to > choose between full and half field rate output (aka frame doubling and > 1:1 input to output frame rate). > > Also, my initial assumption was that currently there is 1:1 input > frames to output frames. But with temporal filtering enabled there's > already one input frame (the first one) that doesn't produce any > output. Hum, ok. >>>>> If we don't start with that supported, I fear userspace will make >>>>> assumptions and be surprised when a full rate mode is added later. >>>> >>>> I'm afraid that since the current VDI already does retain input frame >>>> rate instead of doubling it, the userspace already makes an assumption, >>>> so that ship has sailed. >>> >>> No, this is about the deinterlacer mem2mem device, which doesn't exist >>> before this series. >> >> I am not convinced it is OK if the direct VDI path and mem2mem VDI >> behave differently, that would be surprising to me as a user ? > > Is this still about the frame rate doubling? Surely supporting it in > the mem2mem device and not in the capture path is ok. I'm not arguing > that frame doubling should be enabled by default. My understanding was that your concern was -- frame doubling should be the default because it not being the default would break userspace . Maybe that's not the case ? >>> The CSI capture path already has configurable framedrops (in the CSI). >> >> What am I looking for ? git grep doesn't give me any hits ? (**) > > That's configured by the set_frame_interval pad op of the CSI subdevice > - on the IDMAC output pad. See csi_find_best_skip(). > >>>> But I think we can make the frame doubling configurable ? >>> >>> That would be good. Specifically, there must be no guarantee that one >>> input frame with two fields only produces one deinterlaced output >>> frame, and userspace should somehow be able to understand this. >> >> See my question (**) , where is this configurable framedrops thing ? > > This would have to be done differently, though. Here we don't have > subdev set_frame_interval configuration, and while VIDIOC_S_PARM / > v4l2_captureparm were used to configure frame dropping on capture > devices, that's not really applicable to mem2mem deinterlacers. V4L2_CID_DEINTERLACING_MODE should probably work. >>> I'd rather not default to the setting that throws away half of the >>> input data. Not using frame doubling by default is sensible, but now >>> that using all three input fields to calculate the output frame is >>> possible, why not make that the default. >> >> To save memory bandwidth on the MX6, that's my main concern. > > What userspace are you using to exercise this driver? Maybe we can back > this concern with a few numbers (or mine with pictures). Custom one, but with gstreamer 1.22 and 1.24 driving the media pipeline.