* Re: Hantro H1 Encoding Upstreaming
2025-01-14 16:16 ` Nicolas Dufresne
@ 2025-01-14 18:01 ` Andrzej Pietrasiewicz
2025-01-14 18:06 ` Nicolas Dufresne
2025-01-15 11:31 ` Adam Ford
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Andrzej Pietrasiewicz @ 2025-01-14 18:01 UTC (permalink / raw)
To: Nicolas Dufresne, Daniel Almeida, Adam Ford
Cc: Fabio Estevam, Frank Li, ming.qian, linux-media, linux-imx, paulk,
Benjamin Gaignard, Gustavo Padovan
Hi,
W dniu 14.01.2025 o 17:16, Nicolas Dufresne pisze:
> Hi everyone,
>
> despite Andrzej having left the community, we are not giving up on the encoder
> work. In 2025, we aim at working more seriously on the V4L2 spec, as just
I'm glad you continue working on that. Can you define the "community" here?
Regards,
Andrzej
> writing driver won't cut it. Each class of codecs needs a general workflow spec
> similar to what we have already for stateful encoder/decoder and stateless
> decoder.
>
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-encoder.html
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-stateless-decoder.html
>
> It is on top of this, that for each codec we have to add controls (mostly
> compound) specifics and details that suites stateless accelerators.
>
> From a community stand point, the most important focus is to write and agree on
> spec and controls. Once we have that, vendors will be able to slowly move away
> from their custom solution, and compete on actual hardware rather then
> integration.
>
> It is also time to start looking toward the future, since Hantro H1 is very
> limited and ancient encoder. On same brand, if someone could work on VC8000E
> shipped on IMX8M Plus, or Rockchip codecs, that will certainly help progress. We
> can also get inspiration from many other stateless encoding APIs now, notably
> VA, DXVA and Vulkan Video.
>
> Of course, folks likes to know when this will happen, stateless decoders took 5
> years from start to the first codec being merged, hopefully we don't beat that
> record. I personally aim for producing work during the summer, and mostly focus
> on the spec. Its obvious for me that testing on H1 with a GStreamer
> implementation is the most productive, though I have strong interest in having
> an ecosystem of drivers. A second userspace implementation, perhaps ffmpeg ?,
> could also be useful.
>
> If you'd like to take a bite, this is a good thread to discuss forward. Until
> the summer, I planned to reach to Paul, who made this great presentation [1] at
> FOSDEM last year and start moving the RFC into using these ideas. One of the
> biggest discussion is rate control, it is clear to me that modern HW integrated
> RC offloading, though some HW specific knobs or even firmware offloading, and
> this is what Paul has been putting some thought into.
>
> If decoders have progressed so much in quality in the last few years, it is
> mostly before we have better ways to test them. It is also needed to start
> thinking how do we want to test our encoders. The stateful scene is not all
> green, with a very organic groth and difficult to unify set of encoders. And we
> have no metric of how good or bad they are either.
>
> regards,
> Nicolas
>
> Le lundi 13 janvier 2025 à 18:08 -0300, Daniel Almeida a écrit :
>> +cc Nicolas
>>
>>
>> Hey Adam,
>>
>>
>>>
>>> Daniel,
>>>
>>> Do you know if anyone will be picking up the H1 encoder?
>>>
>>> adam
>>>>
>>>> — Daniel
>>>>
>>>
>>
>> I think my colleague Nicolas is the best person to answer this.
>>
>> — Daniel
>
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: Hantro H1 Encoding Upstreaming
2025-01-14 18:01 ` Andrzej Pietrasiewicz
@ 2025-01-14 18:06 ` Nicolas Dufresne
0 siblings, 0 replies; 17+ messages in thread
From: Nicolas Dufresne @ 2025-01-14 18:06 UTC (permalink / raw)
To: Andrzej Pietrasiewicz, Daniel Almeida, Adam Ford
Cc: Fabio Estevam, Frank Li, ming.qian, linux-media, linux-imx, paulk,
Benjamin Gaignard, Gustavo Padovan
Hi Andrzej,
Le mardi 14 janvier 2025 à 19:01 +0100, Andrzej Pietrasiewicz a écrit :
> Hi,
>
>
> W dniu 14.01.2025 o 17:16, Nicolas Dufresne pisze:
> > Hi everyone,
> >
> > despite Andrzej having left the community, we are not giving up on the encoder
> > work. In 2025, we aim at working more seriously on the V4L2 spec, as just
>
> I'm glad you continue working on that. Can you define the "community" here?
Apologies if I have assumed, you had no interaction over Linux Media mailing
list since your departure from Collabora. It felt like you were not following
the list and not going to interact with Linux Media community anymore. Feel free
to adjust this statement and let us know if you'd like to follow-up on
something.
regards,
Nicolas
>
> Regards,
>
> Andrzej
>
> > writing driver won't cut it. Each class of codecs needs a general workflow spec
> > similar to what we have already for stateful encoder/decoder and stateless
> > decoder.
> >
> > - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html
> > - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-encoder.html
> > - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-stateless-decoder.html
> >
> > It is on top of this, that for each codec we have to add controls (mostly
> > compound) specifics and details that suites stateless accelerators.
> >
> > From a community stand point, the most important focus is to write and agree on
> > spec and controls. Once we have that, vendors will be able to slowly move away
> > from their custom solution, and compete on actual hardware rather then
> > integration.
> >
> > It is also time to start looking toward the future, since Hantro H1 is very
> > limited and ancient encoder. On same brand, if someone could work on VC8000E
> > shipped on IMX8M Plus, or Rockchip codecs, that will certainly help progress. We
> > can also get inspiration from many other stateless encoding APIs now, notably
> > VA, DXVA and Vulkan Video.
> >
> > Of course, folks likes to know when this will happen, stateless decoders took 5
> > years from start to the first codec being merged, hopefully we don't beat that
> > record. I personally aim for producing work during the summer, and mostly focus
> > on the spec. Its obvious for me that testing on H1 with a GStreamer
> > implementation is the most productive, though I have strong interest in having
> > an ecosystem of drivers. A second userspace implementation, perhaps ffmpeg ?,
> > could also be useful.
> >
> > If you'd like to take a bite, this is a good thread to discuss forward. Until
> > the summer, I planned to reach to Paul, who made this great presentation [1] at
> > FOSDEM last year and start moving the RFC into using these ideas. One of the
> > biggest discussion is rate control, it is clear to me that modern HW integrated
> > RC offloading, though some HW specific knobs or even firmware offloading, and
> > this is what Paul has been putting some thought into.
> >
> > If decoders have progressed so much in quality in the last few years, it is
> > mostly before we have better ways to test them. It is also needed to start
> > thinking how do we want to test our encoders. The stateful scene is not all
> > green, with a very organic groth and difficult to unify set of encoders. And we
> > have no metric of how good or bad they are either.
> >
> > regards,
> > Nicolas
> >
> > Le lundi 13 janvier 2025 à 18:08 -0300, Daniel Almeida a écrit :
> > > +cc Nicolas
> > >
> > >
> > > Hey Adam,
> > >
> > >
> > > >
> > > > Daniel,
> > > >
> > > > Do you know if anyone will be picking up the H1 encoder?
> > > >
> > > > adam
> > > > >
> > > > > — Daniel
> > > > >
> > > >
> > >
> > > I think my colleague Nicolas is the best person to answer this.
> > >
> > > — Daniel
> >
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Hantro H1 Encoding Upstreaming
2025-01-14 16:16 ` Nicolas Dufresne
2025-01-14 18:01 ` Andrzej Pietrasiewicz
@ 2025-01-15 11:31 ` Adam Ford
2025-01-15 13:53 ` Michael Tretter
2025-01-15 15:03 ` Paul Kocialkowski
3 siblings, 0 replies; 17+ messages in thread
From: Adam Ford @ 2025-01-15 11:31 UTC (permalink / raw)
To: Nicolas Dufresne
Cc: Daniel Almeida, Fabio Estevam, andrzejtp2010, Frank Li, ming.qian,
linux-media, linux-imx, paulk, Benjamin Gaignard, Gustavo Padovan
On Tue, Jan 14, 2025 at 10:16 AM Nicolas Dufresne
<nicolas.dufresne@collabora.com> wrote:
>
> Hi everyone,
>
> despite Andrzej having left the community, we are not giving up on the encoder
> work. In 2025, we aim at working more seriously on the V4L2 spec, as just
> writing driver won't cut it. Each class of codecs needs a general workflow spec
> similar to what we have already for stateful encoder/decoder and stateless
> decoder.
>
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-encoder.html
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-stateless-decoder.html
>
> It is on top of this, that for each codec we have to add controls (mostly
> compound) specifics and details that suites stateless accelerators.
>
> From a community stand point, the most important focus is to write and agree on
> spec and controls. Once we have that, vendors will be able to slowly move away
> from their custom solution, and compete on actual hardware rather then
> integration.
>
> It is also time to start looking toward the future, since Hantro H1 is very
> limited and ancient encoder. On same brand, if someone could work on VC8000E
> shipped on IMX8M Plus, or Rockchip codecs, that will certainly help progress. We
> can also get inspiration from many other stateless encoding APIs now, notably
> VA, DXVA and Vulkan Video.
I have an 8MP and I looked at the VC800E a little, and I attempted to
port what was done before to the 8MM. If that's a sufficient place to
start, I can give it a try, but if the userpace isn't finalized, I am
not sure how to test it.
>
> Of course, folks likes to know when this will happen, stateless decoders took 5
> years from start to the first codec being merged, hopefully we don't beat that
> record. I personally aim for producing work during the summer, and mostly focus
> on the spec. Its obvious for me that testing on H1 with a GStreamer
> implementation is the most productive, though I have strong interest in having
> an ecosystem of drivers. A second userspace implementation, perhaps ffmpeg ?,
> could also be useful.
>
> If you'd like to take a bite, this is a good thread to discuss forward. Until
> the summer, I planned to reach to Paul, who made this great presentation [1] at
> FOSDEM last year and start moving the RFC into using these ideas. One of the
> biggest discussion is rate control, it is clear to me that modern HW integrated
> RC offloading, though some HW specific knobs or even firmware offloading, and
> this is what Paul has been putting some thought into.
I'll take a look at the presentation. I will admit that I am not an
expert on the video formats or how the encoding works, but I'm willing
to try.
>
> If decoders have progressed so much in quality in the last few years, it is
> mostly before we have better ways to test them. It is also needed to start
> thinking how do we want to test our encoders. The stateful scene is not all
> green, with a very organic groth and difficult to unify set of encoders. And we
> have no metric of how good or bad they are either.
adam
>
> regards,
> Nicolas
>
> Le lundi 13 janvier 2025 à 18:08 -0300, Daniel Almeida a écrit :
> > +cc Nicolas
> >
> >
> > Hey Adam,
> >
> >
> > >
> > > Daniel,
> > >
> > > Do you know if anyone will be picking up the H1 encoder?
> > >
> > > adam
> > > >
> > > > — Daniel
> > > >
> > >
> >
> > I think my colleague Nicolas is the best person to answer this.
> >
> > — Daniel
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Hantro H1 Encoding Upstreaming
2025-01-14 16:16 ` Nicolas Dufresne
2025-01-14 18:01 ` Andrzej Pietrasiewicz
2025-01-15 11:31 ` Adam Ford
@ 2025-01-15 13:53 ` Michael Tretter
2025-01-15 15:03 ` Paul Kocialkowski
3 siblings, 0 replies; 17+ messages in thread
From: Michael Tretter @ 2025-01-15 13:53 UTC (permalink / raw)
To: Nicolas Dufresne
Cc: Daniel Almeida, Adam Ford, Fabio Estevam, andrzejtp2010, Frank Li,
ming.qian, linux-media, linux-imx, paulk, Benjamin Gaignard,
Gustavo Padovan, Marco Felsch, kernel
Hi Nicolas,
On Tue, 14 Jan 2025 11:16:47 -0500, Nicolas Dufresne wrote:
> despite Andrzej having left the community, we are not giving up on the encoder
> work. In 2025, we aim at working more seriously on the V4L2 spec, as just
> writing driver won't cut it. Each class of codecs needs a general workflow spec
> similar to what we have already for stateful encoder/decoder and stateless
> decoder.
>
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-encoder.html
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-stateless-decoder.html
>
> It is on top of this, that for each codec we have to add controls (mostly
> compound) specifics and details that suites stateless accelerators.
>
> From a community stand point, the most important focus is to write and agree on
> spec and controls. Once we have that, vendors will be able to slowly move away
> from their custom solution, and compete on actual hardware rather then
> integration.
>
> It is also time to start looking toward the future, since Hantro H1 is very
> limited and ancient encoder. On same brand, if someone could work on VC8000E
> shipped on IMX8M Plus, or Rockchip codecs, that will certainly help progress.
Marco Felsch and I recently started to work on stateless encoders, too.
Marco is working on a driver for VC8000E and I am working on a driver
for the Rockchip VEPU580. As user space, we are currently using the
GStreamer element from the draft merge request [0] on both drivers.
Michael
[0] https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5676
> We
> can also get inspiration from many other stateless encoding APIs now, notably
> VA, DXVA and Vulkan Video.
>
> Of course, folks likes to know when this will happen, stateless decoders took 5
> years from start to the first codec being merged, hopefully we don't beat that
> record. I personally aim for producing work during the summer, and mostly focus
> on the spec. Its obvious for me that testing on H1 with a GStreamer
> implementation is the most productive, though I have strong interest in having
> an ecosystem of drivers. A second userspace implementation, perhaps ffmpeg ?,
> could also be useful.
>
> If you'd like to take a bite, this is a good thread to discuss forward. Until
> the summer, I planned to reach to Paul, who made this great presentation [1] at
> FOSDEM last year and start moving the RFC into using these ideas. One of the
> biggest discussion is rate control, it is clear to me that modern HW integrated
> RC offloading, though some HW specific knobs or even firmware offloading, and
> this is what Paul has been putting some thought into.
>
> If decoders have progressed so much in quality in the last few years, it is
> mostly before we have better ways to test them. It is also needed to start
> thinking how do we want to test our encoders. The stateful scene is not all
> green, with a very organic groth and difficult to unify set of encoders. And we
> have no metric of how good or bad they are either.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Hantro H1 Encoding Upstreaming
2025-01-14 16:16 ` Nicolas Dufresne
` (2 preceding siblings ...)
2025-01-15 13:53 ` Michael Tretter
@ 2025-01-15 15:03 ` Paul Kocialkowski
2025-01-15 19:43 ` Nicolas Dufresne
` (2 more replies)
3 siblings, 3 replies; 17+ messages in thread
From: Paul Kocialkowski @ 2025-01-15 15:03 UTC (permalink / raw)
To: Nicolas Dufresne
Cc: Daniel Almeida, Adam Ford, Fabio Estevam, andrzejtp2010, Frank Li,
ming.qian, linux-media, linux-imx, Benjamin Gaignard,
Gustavo Padovan
[-- Attachment #1: Type: text/plain, Size: 7187 bytes --]
Hi folks,
Le Tue 14 Jan 25, 11:16, Nicolas Dufresne a écrit :
> despite Andrzej having left the community, we are not giving up on the encoder
> work. In 2025, we aim at working more seriously on the V4L2 spec, as just
> writing driver won't cut it. Each class of codecs needs a general workflow spec
> similar to what we have already for stateful encoder/decoder and stateless
> decoder.
>
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-encoder.html
> - https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-stateless-decoder.html
>
> It is on top of this, that for each codec we have to add controls (mostly
> compound) specifics and details that suites stateless accelerators.
>
> From a community stand point, the most important focus is to write and agree on
> spec and controls. Once we have that, vendors will be able to slowly move away
> from their custom solution, and compete on actual hardware rather then
> integration.
Thanks for the continued interest in this topic, I am also still interested in
pushing it forward and defining a mainline API for stateless encore that fits
the bill.
> It is also time to start looking toward the future, since Hantro H1 is very
> limited and ancient encoder. On same brand, if someone could work on VC8000E
> shipped on IMX8M Plus, or Rockchip codecs, that will certainly help progress. We
> can also get inspiration from many other stateless encoding APIs now, notably
> VA, DXVA and Vulkan Video.
The VC8000E on the i.MX8MP is definitely my next hardware of interest here.
I will have time available to work on it in the near future.
> Of course, folks likes to know when this will happen, stateless decoders took 5
> years from start to the first codec being merged, hopefully we don't beat that
> record. I personally aim for producing work during the summer, and mostly focus
> on the spec.
To be fair we are not starting from scratch and seem to have a good momentum
here so I am hopeful it will not take as long!
> Its obvious for me that testing on H1 with a GStreamer
> implementation is the most productive, though I have strong interest in having
> an ecosystem of drivers. A second userspace implementation, perhaps ffmpeg ?,
> could also be useful.
Would be glad to not have to work on the GStreamer side and focus on kernel
work instead. Sofar we can already aim to support:
- Hantro H1
- Hantro H2/VC8000E
- Allwinner Video Engine
> If you'd like to take a bite, this is a good thread to discuss forward. Until
> the summer, I planned to reach to Paul, who made this great presentation [1] at
> FOSDEM last year and start moving the RFC into using these ideas. One of the
> biggest discussion is rate control, it is clear to me that modern HW integrated
> RC offloading, though some HW specific knobs or even firmware offloading, and
> this is what Paul has been putting some thought into.
In terms of RC offloading, what's I've seen in the Hantro H1 is a checkpoint
mechanism that allows making per-slice QP adjustments around the global picture
QP to bit the bill in terms of size. This can be a desirable thing if the use
case is to stick to a given bitrate strictly.
There's also the regions of interest that are supported by many (most?) encoders
and allow region-based QP changes (typically as offset). The number of available
slots is hardware-specific.
In addition the H1 provides some extra statistics such as the "average"
resulting QP when on of these methods is used.
I guess my initial point about rate control was that it would be easier for
userspace to be able to choose a rate-control strategy directly and to have
common implementations kernel-side that would apply to all codecs. It also
allows leveraging hardware features without userspace knowing about them.
However the main drawback is that there will always be a need for a more
specific/advanced use-case than what the kernel is doing (e.g. using a npu),
which would need userspace to have more control over the encoder.
So a more direct interface would be required to let userspace do rate-control.
At the end of the day, I think it would make more sense to expose these encoders
for what they are and deal with the QP and features directly through the uAPI
and avoid any kernel-side rate-control. Hardware-specific features that need to
be configured and may return stats would just have extra controls for those.
So all in all we'd need a few new controls to configure the encode for codecs
(starting with h.264) and also some to provide encode stats (e.g. requested qp,
average qp). It feels like we could benefit from existing stateful encoder
controls for various bitstream parameters.
Then userspace would be responsible for configuring each encode run with a
target QP value, picture type and list of references. We'd need to also inform
userspace of how many references are supported.
Another topic of interest is bitstream generation. I believe it would be easier
for the kernel side to generate those (some hardware has specific registers to
write them) based on the configuration provided by userspace through controls.
It is also useful to be able to regenerate them on demand. I am not sure if
there would be interest in more precise tracking of bitstream headers (e.g.
H.264 PPS and SPS that have ids) and be able to bind them to specific encode
runs.
We could have some common per-codec bitstream generation v4l2 code with either
a cpu buffer access backend or a driver-specific implementation for writing the
bits. I already have a base for this in my cedrus h264 encoder work:
https://github.com/bootlin/linux/blob/cedrus/h264-encoding/drivers/staging/media/sunxi/cedrus/cedrus_enc_h264.c#L722
Last words about private driver buffers (such as motion vectors and
reconstruction buffers), I think they should remain private and unseen from
userspace. We could add something extra to the uAPI later if there is really a
need to access those.
Cheers,
Paul
> If decoders have progressed so much in quality in the last few years, it is
> mostly before we have better ways to test them. It is also needed to start
> thinking how do we want to test our encoders. The stateful scene is not all
> green, with a very organic groth and difficult to unify set of encoders. And we
> have no metric of how good or bad they are either.
>
> regards,
> Nicolas
>
> Le lundi 13 janvier 2025 à 18:08 -0300, Daniel Almeida a écrit :
> > +cc Nicolas
> >
> >
> > Hey Adam,
> >
> >
> > >
> > > Daniel,
> > >
> > > Do you know if anyone will be picking up the H1 encoder?
> > >
> > > adam
> > > >
> > > > — Daniel
> > > >
> > >
> >
> > I think my colleague Nicolas is the best person to answer this.
> >
> > — Daniel
>
--
Paul Kocialkowski,
Independent contractor - sys-base - https://www.sys-base.io/
Free software developer - https://www.paulk.fr/
Expert in multimedia, graphics and embedded hardware support with Linux.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: Hantro H1 Encoding Upstreaming
2025-01-15 15:03 ` Paul Kocialkowski
@ 2025-01-15 19:43 ` Nicolas Dufresne
2025-01-18 16:49 ` Paul Kocialkowski
2025-01-15 19:51 ` Nicolas Dufresne
2025-01-15 20:14 ` Nicolas Dufresne
2 siblings, 1 reply; 17+ messages in thread
From: Nicolas Dufresne @ 2025-01-15 19:43 UTC (permalink / raw)
To: Paul Kocialkowski
Cc: Daniel Almeida, Adam Ford, Fabio Estevam, andrzejtp2010, Frank Li,
ming.qian, linux-media, linux-imx, Benjamin Gaignard,
Gustavo Padovan
Forking the thread,
Le mercredi 15 janvier 2025 à 16:03 +0100, Paul Kocialkowski a écrit :
> Last words about private driver buffers (such as motion vectors and
> reconstruction buffers), I think they should remain private and unseen from
> userspace. We could add something extra to the uAPI later if there is really a
> need to access those.
I don't know if you noticed, but Jacopo started a proposal around multi-context
media controller. For this type of extension, my long term idea was that we
could adopt this, and introduced new nodes to expose specialized memory. These
nodes would be unlike by default, meaning the default behaviour with a single
m2m video node would remain.
An existing use case for that would be in the decoder space, VC8000D and up have
4 post processed output, which mean up to 5 outputs if you count the reference
frames. So we could set it up:
bitstream -> m2m -> reference frames
|
-- capture 1 -> post processed
|
-- capture 2 -> post processed
|
-- capture 3 -> post processed
|
-- capture 4 -> post processed
Simpler said then done, but I think this can work. I suspect it is quite
feasible to keep the stream state separated, allowing to reconfigure the chosen
output resolution without having to reset the decoder state (which is only bound
to reference frames). It also solve few issues we have in regard to over-memory
allocation when we hide the reference frames.
For encoders, reconstruction frames would also be capture nodes. I'm not
completely versed into what they can be used for, also their pixel format would
have to be known to be useful of course.
Nicolas
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: Hantro H1 Encoding Upstreaming
2025-01-15 19:43 ` Nicolas Dufresne
@ 2025-01-18 16:49 ` Paul Kocialkowski
0 siblings, 0 replies; 17+ messages in thread
From: Paul Kocialkowski @ 2025-01-18 16:49 UTC (permalink / raw)
To: Nicolas Dufresne
Cc: Daniel Almeida, Adam Ford, Fabio Estevam, andrzejtp2010, Frank Li,
ming.qian, linux-media, linux-imx, Benjamin Gaignard,
Gustavo Padovan
[-- Attachment #1: Type: text/plain, Size: 2883 bytes --]
Hi,
Le Wed 15 Jan 25, 14:43, Nicolas Dufresne a écrit :
> Le mercredi 15 janvier 2025 à 16:03 +0100, Paul Kocialkowski a écrit :
> > Last words about private driver buffers (such as motion vectors and
> > reconstruction buffers), I think they should remain private and unseen from
> > userspace. We could add something extra to the uAPI later if there is really a
> > need to access those.
>
> I don't know if you noticed, but Jacopo started a proposal around multi-context
> media controller. For this type of extension, my long term idea was that we
> could adopt this, and introduced new nodes to expose specialized memory. These
> nodes would be unlike by default, meaning the default behaviour with a single
> m2m video node would remain.
>
> An existing use case for that would be in the decoder space, VC8000D and up have
> 4 post processed output, which mean up to 5 outputs if you count the reference
> frames. So we could set it up:
Sounds very interesting to handle multi-core codecs and devices with some
separate post-processing output (IIRC the allwinner video decoder can have some
extra thumbnail output which can be very handy for JPEG stuff).
> Simpler said then done, but I think this can work. I suspect it is quite
> feasible to keep the stream state separated, allowing to reconfigure the chosen
> output resolution without having to reset the decoder state (which is only bound
> to reference frames). It also solve few issues we have in regard to over-memory
> allocation when we hide the reference frames.
>
> For encoders, reconstruction frames would also be capture nodes. I'm not
> completely versed into what they can be used for, also their pixel format would
> have to be known to be useful of course.
Makes a lot of sense. Honestly this is starting to look like the ISP situation
where we have multiple video nodes dedicated to specific things and various
specific buffer formats for them. This brings a lot of flexibiliy and many
possibilities for decoders/encoders.
In contrast the ISP API uses a separate video device for metadata/configuration
submission, which we do through the request API and controls for the
decoder/encoder cases. But we could imagine adding extra source video nodes to
provide e.g. random bitstream units to stuff for encoding. And just make sure
they are submitted with the same request. I guess that should work since the
request is a media-wide object and not video node specific.
Anyways, like you say, simpler said than done but it seems like a reasonable
design extension that would solve a lot of current API limitations.
Cheers,
Paul
--
Paul Kocialkowski,
Independent contractor - sys-base - https://www.sys-base.io/
Free software developer - https://www.paulk.fr/
Expert in multimedia, graphics and embedded hardware support with Linux.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Hantro H1 Encoding Upstreaming
2025-01-15 15:03 ` Paul Kocialkowski
2025-01-15 19:43 ` Nicolas Dufresne
@ 2025-01-15 19:51 ` Nicolas Dufresne
2025-01-18 17:00 ` Paul Kocialkowski
2025-01-15 20:14 ` Nicolas Dufresne
2 siblings, 1 reply; 17+ messages in thread
From: Nicolas Dufresne @ 2025-01-15 19:51 UTC (permalink / raw)
To: Paul Kocialkowski
Cc: Daniel Almeida, Adam Ford, Fabio Estevam, andrzejtp2010, Frank Li,
ming.qian, linux-media, linux-imx, Benjamin Gaignard,
Gustavo Padovan
Le mercredi 15 janvier 2025 à 16:03 +0100, Paul Kocialkowski a écrit :
> We could have some common per-codec bitstream generation v4l2 code with either
> a cpu buffer access backend or a driver-specific implementation for writing the
> bits. I already have a base for this in my cedrus h264 encoder work:
> https://github.com/bootlin/linux/blob/cedrus/h264-encoding/drivers/staging/media/sunxi/cedrus/cedrus_enc_h264.c#L722
There is a lot of code in there that you can throw directly into v4l2-h264, this
is exactly what that library is meant for. It had never meant to be limited to
generating intermediate reference lists for decoders, or to be decoder specific.
Note that golomb coding can further be generalized.
I do agree at least for now that letting the driver write headers have more
advantages. It allows notably to turn off the knobs that would not otherwise be
supported. The modification would of course be reference at s_ctrl time,
assuming you reuse existing sps/pps and other similar compound controls. As we
didn't have encoder in mind when we created these compound controls, its
possible that we'll have to add an extended one to fill the gaps, which has
always been the plan.
Nicolas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Hantro H1 Encoding Upstreaming
2025-01-15 19:51 ` Nicolas Dufresne
@ 2025-01-18 17:00 ` Paul Kocialkowski
0 siblings, 0 replies; 17+ messages in thread
From: Paul Kocialkowski @ 2025-01-18 17:00 UTC (permalink / raw)
To: Nicolas Dufresne
Cc: Daniel Almeida, Adam Ford, Fabio Estevam, andrzejtp2010, Frank Li,
ming.qian, linux-media, linux-imx, Benjamin Gaignard,
Gustavo Padovan
[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]
Hi,
Le Wed 15 Jan 25, 14:51, Nicolas Dufresne a écrit :
> Le mercredi 15 janvier 2025 à 16:03 +0100, Paul Kocialkowski a écrit :
> > We could have some common per-codec bitstream generation v4l2 code with either
> > a cpu buffer access backend or a driver-specific implementation for writing the
> > bits. I already have a base for this in my cedrus h264 encoder work:
> > https://github.com/bootlin/linux/blob/cedrus/h264-encoding/drivers/staging/media/sunxi/cedrus/cedrus_enc_h264.c#L722
>
> There is a lot of code in there that you can throw directly into v4l2-h264, this
> is exactly what that library is meant for. It had never meant to be limited to
> generating intermediate reference lists for decoders, or to be decoder specific.
> Note that golomb coding can further be generalized.
>
> I do agree at least for now that letting the driver write headers have more
> advantages. It allows notably to turn off the knobs that would not otherwise be
> supported.
Yes it seems common that hardware will not support certain features or values
and need some bitstream values set to some hard-coded values. There's also
various controls for the stateful API that define most of the basic things that
can be configured in the SPS/PPS. Then each driver exposing what it supports
is a pretty good fit.
> The modification would of course be reference at s_ctrl time,
> assuming you reuse existing sps/pps and other similar compound controls. As we
> didn't have encoder in mind when we created these compound controls, its
> possible that we'll have to add an extended one to fill the gaps, which has
> always been the plan.
I'm not really sure it's needed to pass the whole pps/sps in controls.
Reusing the individual stafeul controls feels like a good fit from what I can
see. And yes we'd need some extra info to be passed from userspace about things
like frame type, qp, etc that will impact the generated bitstream.
Paul
--
Paul Kocialkowski,
Independent contractor - sys-base - https://www.sys-base.io/
Free software developer - https://www.paulk.fr/
Expert in multimedia, graphics and embedded hardware support with Linux.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Hantro H1 Encoding Upstreaming
2025-01-15 15:03 ` Paul Kocialkowski
2025-01-15 19:43 ` Nicolas Dufresne
2025-01-15 19:51 ` Nicolas Dufresne
@ 2025-01-15 20:14 ` Nicolas Dufresne
2025-01-18 17:15 ` Paul Kocialkowski
2 siblings, 1 reply; 17+ messages in thread
From: Nicolas Dufresne @ 2025-01-15 20:14 UTC (permalink / raw)
To: Paul Kocialkowski
Cc: Daniel Almeida, Adam Ford, Fabio Estevam, andrzejtp2010, Frank Li,
ming.qian, linux-media, linux-imx, Benjamin Gaignard,
Gustavo Padovan
Le mercredi 15 janvier 2025 à 16:03 +0100, Paul Kocialkowski a écrit :
> Would be glad to not have to work on the GStreamer side and focus on kernel
> work instead. Sofar we can already aim to support:
> - Hantro H1
> - Hantro H2/VC8000E
> - Allwinner Video Engine
And Rockchip VEPUs, which have Open Source software implementation in libMPP.
Most of have access to reference software for the Hantro variants, I suppose you
have revered the Allwinner ?
p.s. there is also Imagination stateless codecs, but I only seen them on older
TI board.
>
> > If you'd like to take a bite, this is a good thread to discuss forward. Until
> > the summer, I planned to reach to Paul, who made this great presentation [1] at
> > FOSDEM last year and start moving the RFC into using these ideas. One of the
> > biggest discussion is rate control, it is clear to me that modern HW integrated
> > RC offloading, though some HW specific knobs or even firmware offloading, and
> > this is what Paul has been putting some thought into.
>
> In terms of RC offloading, what's I've seen in the Hantro H1 is a checkpoint
> mechanism that allows making per-slice QP adjustments around the global picture
> QP to bit the bill in terms of size. This can be a desirable thing if the use
> case is to stick to a given bitrate strictly.
>
> There's also the regions of interest that are supported by many (most?) encoders
> and allow region-based QP changes (typically as offset). The number of available
> slots is hardware-specific.
Checkpoints seems unique Hantro, it has a lot of limitation as it 8 a raster set
of blocks. It won't perform well with a important object in the middle of the
scene.
>
> In addition the H1 provides some extra statistics such as the "average"
> resulting QP when on of these methods is used.
Wasn't the statistic MAD (mean average distance), which is basically the average
residual values ? In my copy of VC8000E reference someone, all that has been
commented out, and the x265 implementation copied over (remember you can pay to
use their code in proprietary form, before jumping onto license violation).
>
> I guess my initial point about rate control was that it would be easier for
> userspace to be able to choose a rate-control strategy directly and to have
> common implementations kernel-side that would apply to all codecs. It also
> allows leveraging hardware features without userspace knowing about them.
>
> However the main drawback is that there will always be a need for a more
> specific/advanced use-case than what the kernel is doing (e.g. using a npu),
> which would need userspace to have more control over the encoder.
Which brings to the most modern form of advanced rate control. You will find
this in DXVA and Vulkan Video. It consist of splitting the image as an even
grid, and allowing delta or qualitative differences of QP for each of the
element in the grid. The size of that grid is limited by HW, you can implement
ROI on top of this too. Though, if the HW has ROI directly, we don't have much
option but to expose it as such, which is fine. A lot of stateful encoder have
that too, and the controls should be the same.
>
> So a more direct interface would be required to let userspace do rate-control.
> At the end of the day, I think it would make more sense to expose these encoders
> for what they are and deal with the QP and features directly through the uAPI
> and avoid any kernel-side rate-control. Hardware-specific features that need to
> be configured and may return stats would just have extra controls for those.
>
> So all in all we'd need a few new controls to configure the encode for codecs
> (starting with h.264) and also some to provide encode stats (e.g. requested qp,
> average qp). It feels like we could benefit from existing stateful encoder
> controls for various bitstream parameters.
Sounds like we should offer both. As I stated earlier, modern HW resort to
firmware offloading for performance reason. In V4L2, this is even more true. If
you read statistics such as MAD, bitstream size in a frame by frame basis, then
you will never queue more then 1 buffer on the capture side. So the programming
latency (including RC latency) will directly impact the encoder throughput. With
offloading, the statistic can be handled in firmware, or without any context
switch, which improve throughput.
This needs to be unbiased, the GStreamer implementation we did for the last RFC
runs frame by frame, using last frame size as the statistic. We still managed
the specified IP performance documented in the white paper.
Like everything else, we don't need all this in a first uAPI, but we need to
define the minimum "required" features.
>
> Then userspace would be responsible for configuring each encode run with a
> target QP value, picture type and list of references. We'd need to also inform
> userspace of how many references are supported.
The H1 only have 1 reference + 1 long term reference (which only 1 reference was
implemented). We used the default reference model, so there was only one way to
manage and pass reference. There is clearly a lot more research to be done
around reference management.
Nicolas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Hantro H1 Encoding Upstreaming
2025-01-15 20:14 ` Nicolas Dufresne
@ 2025-01-18 17:15 ` Paul Kocialkowski
0 siblings, 0 replies; 17+ messages in thread
From: Paul Kocialkowski @ 2025-01-18 17:15 UTC (permalink / raw)
To: Nicolas Dufresne
Cc: Daniel Almeida, Adam Ford, Fabio Estevam, andrzejtp2010, Frank Li,
ming.qian, linux-media, linux-imx, Benjamin Gaignard,
Gustavo Padovan
[-- Attachment #1: Type: text/plain, Size: 7540 bytes --]
Hi,
Le Wed 15 Jan 25, 15:14, Nicolas Dufresne a écrit :
> Le mercredi 15 janvier 2025 à 16:03 +0100, Paul Kocialkowski a écrit :
> > Would be glad to not have to work on the GStreamer side and focus on kernel
> > work instead. Sofar we can already aim to support:
> > - Hantro H1
> > - Hantro H2/VC8000E
> > - Allwinner Video Engine
>
> And Rockchip VEPUs, which have Open Source software implementation in libMPP.
> Most of have access to reference software for the Hantro variants, I suppose you
> have revered the Allwinner ?
Ah right, I haven't looked at Rockchip's own encoder implementations for a
while. I guess that's also called RKVENC.
> p.s. there is also Imagination stateless codecs, but I only seen them on older
> TI board.
Oh I didn't know Imagination also made stateless encoders. I was under the
impression those used in the various Jacinto families were stateful.
> > > If you'd like to take a bite, this is a good thread to discuss forward. Until
> > > the summer, I planned to reach to Paul, who made this great presentation [1] at
> > > FOSDEM last year and start moving the RFC into using these ideas. One of the
> > > biggest discussion is rate control, it is clear to me that modern HW integrated
> > > RC offloading, though some HW specific knobs or even firmware offloading, and
> > > this is what Paul has been putting some thought into.
> >
> > In terms of RC offloading, what's I've seen in the Hantro H1 is a checkpoint
> > mechanism that allows making per-slice QP adjustments around the global picture
> > QP to bit the bill in terms of size. This can be a desirable thing if the use
> > case is to stick to a given bitrate strictly.
> >
> > There's also the regions of interest that are supported by many (most?) encoders
> > and allow region-based QP changes (typically as offset). The number of available
> > slots is hardware-specific.
>
> Checkpoints seems unique Hantro, it has a lot of limitation as it 8 a raster set
> of blocks. It won't perform well with a important object in the middle of the
> scene.
Yes I'm not saying it's particularly useful but more as an example that some
hardware will provide such unique/custom features.
> > In addition the H1 provides some extra statistics such as the "average"
> > resulting QP when on of these methods is used.
>
> Wasn't the statistic MAD (mean average distance), which is basically the average
> residual values ? In my copy of VC8000E reference someone, all that has been
> commented out, and the x265 implementation copied over (remember you can pay to
> use their code in proprietary form, before jumping onto license violation).
Ah yes you're right! MAD and average QP. Again not sure how useful it really
is in practice.
> > I guess my initial point about rate control was that it would be easier for
> > userspace to be able to choose a rate-control strategy directly and to have
> > common implementations kernel-side that would apply to all codecs. It also
> > allows leveraging hardware features without userspace knowing about them.
> >
> > However the main drawback is that there will always be a need for a more
> > specific/advanced use-case than what the kernel is doing (e.g. using a npu),
> > which would need userspace to have more control over the encoder.
>
> Which brings to the most modern form of advanced rate control. You will find
> this in DXVA and Vulkan Video. It consist of splitting the image as an even
> grid, and allowing delta or qualitative differences of QP for each of the
> element in the grid. The size of that grid is limited by HW, you can implement
> ROI on top of this too. Though, if the HW has ROI directly, we don't have much
> option but to expose it as such, which is fine. A lot of stateful encoder have
> that too, and the controls should be the same.
Oh that's neat! Thanks for the insight and definitely good to have in mind.
> > So a more direct interface would be required to let userspace do rate-control.
> > At the end of the day, I think it would make more sense to expose these encoders
> > for what they are and deal with the QP and features directly through the uAPI
> > and avoid any kernel-side rate-control. Hardware-specific features that need to
> > be configured and may return stats would just have extra controls for those.
> >
> > So all in all we'd need a few new controls to configure the encode for codecs
> > (starting with h.264) and also some to provide encode stats (e.g. requested qp,
> > average qp). It feels like we could benefit from existing stateful encoder
> > controls for various bitstream parameters.
>
> Sounds like we should offer both. As I stated earlier, modern HW resort to
> firmware offloading for performance reason. In V4L2, this is even more true. If
> you read statistics such as MAD, bitstream size in a frame by frame basis, then
> you will never queue more then 1 buffer on the capture side. So the programming
> latency (including RC latency) will directly impact the encoder throughput. With
> offloading, the statistic can be handled in firmware, or without any context
> switch, which improve throughput.
Right that is a very valid and central point. Indeed we do need a way to take
the decision about the encode parameters for the next frame pretty much as soon
as the next m2m job is started. Waiting for userspace to take the decision based
on returned statistics would definitely stall the encoder for a while.
On the other hand there are cases where we cannot handle it all kernel-side
and we do need userspace interaction between previous and next frame.
So here is a suggestion here which may sound a bit wild but sounds to me like
it could actually work out: how about adding BPF support in V4L2 for
implementing the encoder strategy?
Then we can have the kernel and userspace working on the same ground and
everything actually running kernel-side without starving the encoder.
I guess we'd essentially need to provide the BPF program with enough information
(maybe some hardware-specific too) to take the decision of the next frame's
encode parameters.
Of course I have no prior knowledge on how to implement this, but again it feels
like it could be a good fit for the situation we have to deal with.
> This needs to be unbiased, the GStreamer implementation we did for the last RFC
> runs frame by frame, using last frame size as the statistic. We still managed
> the specified IP performance documented in the white paper.
That's nice but indeed suboptimal. Let's beat that white paper.
> Like everything else, we don't need all this in a first uAPI, but we need to
> define the minimum "required" features.
>
> >
> > Then userspace would be responsible for configuring each encode run with a
> > target QP value, picture type and list of references. We'd need to also inform
> > userspace of how many references are supported.
>
> The H1 only have 1 reference + 1 long term reference (which only 1 reference was
> implemented). We used the default reference model, so there was only one way to
> manage and pass reference. There is clearly a lot more research to be done
> around reference management.
Yes absolutely.
Cheers,
Paul
--
Paul Kocialkowski,
Independent contractor - sys-base - https://www.sys-base.io/
Free software developer - https://www.paulk.fr/
Expert in multimedia, graphics and embedded hardware support with Linux.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread