Potential ways to describe virtio-video device capabilities

devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Potential ways to describe virtio-video device capabilities
@ 2023-09-22  5:07 Alexander Gordeev
  2023-09-22 10:03 ` Laurent Pinchart
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander Gordeev @ 2023-09-22  5:07 UTC (permalink / raw)
  To: virtio-dev, linux-media, acourbot, alexlau, daniel, dgreid,
	dstaessens, egranata, fziglio, hverkuil, keiichiw, kraxel,
	marcheu, posciak, spice-devel, stevensd, tfiga, uril, devicetree,
	laurent.pinchart, alex.bennee, aesteve, Matti.Moell,
	andrew.gazizov, daniel.almeida, cohuck, mwojtas, mst,
	peter.griffin, bag, bgrzesik, hmazur, mikrawczyk, srosek

Hi,

I'm working on updating virtio-video draft v8 spec [1] and the 
virtio-video V4L2 driver [2]. One of the things, that I don't like in 
the current spec draft is sharing the device's capabilities with the 
guest VM. The main requirement is making these capabilities compatible 
with V4L2.

These capabilities could be pretty complex, see [3] and [4]:
1. First there could be several coded video formats. Coded formats have 
their specific sets of supported controls.
2. Then for each coded video formats there could be different sets of 
raw video formats, that could be used in combination with the selected 
encoded format for decoding/encoding.
3. Then for each combination of the coded and raw format there could be 
different sets of supported resolutions.
4. (Optional) for each coded format, raw format and resolution there 
could be different sets of supported frame rates/intervals.

In the future new formats, controls, flags, etc could be defined. Right 
now there is a rather static structure, see section 5.20.7.3.1 
(VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS) in [5]. It looks too inflexible.

The V4L2 approach with many different ioctl's and complex querying logic 
doesn't fit well for virtio-video IMHO. syscall overhead is only a few 
hundred nanoseconds, so doing tens or hundreds of them is bearable in 
case of video. But a roundtrip over virtio may take hundreds of 
microseconds even in the local case. We at OpenSynergy already have 
setups where the real hardware is accessed over a network. Then it can 
take a millisecond. People already seem to agree, that this amount of 
overhead makes V4L2-style discovery process unbearable on a per stream 
basis. So all the relevant data has to be obtained during the device 
probing. This means, that in many cases there is a complex structure 
with all the data on the device side, and we just need to move it to the 
driver side. Moving it in one step seems easier to me and better because 
of the latency. Boot time matters too sometimes.

One of the ideas is to replace the mentioned 
VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS command response with a standalone 
Device Tree Blob. It looks the most promising to me right now. I guess, 
it may sound crazy to many people, but actually it fits into one of the 
device tree usage patterns outlined in [6]. This document is referenced 
in the kernel device tree documentation, so I assume it is correct.

A simplified version could look like this, for example:

/dts-v1/;

/ {
     virtio-video {
         compatible = "virtio,video";

         virtio,video-caps {
             h264 {
                 profiles-mask = <0x3ffff>;
                 levels-mask = <0xfffff>;
                 num-resources-range = <1 32>;
                 buffer-size = <0x100000>;
                 bitrates-range = <100000 10000000>;

                 yuv420 {
                     plane-layout-mask = <0x3>;
                     plane-align = <1>;
                     stride-align-mask = <0x10>;
                     widths-range-stepwise = <96 1920 8>;
                     heights-range-stepwise = <96 1080 1>;
                     num-resources-range = <4 50>;
                 };

                 nv12 {
                     /* ... */
                 };

                 rgba {
                     /* ... */
                 };
             };

             hevc {
                 /* ... */
             };

             vp8 {
                 /* ... */
             };

             vp9 {
                 /* ... */
             };
         };
     };
};

Or maybe the resolutions could be defined separately and linked using 
phandles to avoid duplication because they are going to depend on the 
raw formats exclusively in most cases, I think.

There are many benefits IMO:

1. Device tree can be used to describe arbitrary trees (and even 
arbitrary graphs with phandles). The underlying data structure is 
generic. It looks like it can fit very well here.
2. There is a specification of the format [7]. I hope it could be 
referenced in the virtio spec, right?
3. There is already DTS, DTC, libfdt and DTB parsing code in Linux. All 
of this can be reused. For example, at the moment we have a custom 
configuration file format and a big chunk of code to handle it in our 
virtio-video device. These could be replaced by DTS and calls to libfdt 
completely, I think. There is also an implementation in Rust [8].
4. I think the standalone DTB could be integrated into a board DTB later 
reducing the amount of queries to zero. It is not going to make a big 
difference in latency though.

If device trees are used, then we'll need add a binding/schema to the 
kernel or to the dt-schema repo [9]. Maybe the schema could be 
referenced in the virtio-video spec instead of duplicating it? This 
would reduce the spec size.

I don't know if anybody has already done anything like this and I'm not 
sure if the device tree maintainers and other involved parties would 
approve it. That's why I'm starting this thread. Please share your 
opinions about the idea.

An alternative to using device trees would be inventing something 
similar and simpler (without phandles and strings) from scratch. That's 
fine too, but doesn't allow to reuse the tooling and also is going to 
make the virtio-video spec even bigger.

[1] 
https://lore.kernel.org/virtio-comment/20230629144915.597188-1-Alexander.Gordeev@opensynergy.com/
[2] 
https://lore.kernel.org/linux-media/20200218202753.652093-1-dmitry.sepp@opensynergy.com/
[3] 
https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html#querying-capabilities
[4] 
https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html#querying-capabilities
[5] https://drive.google.com/file/d/1uPg4kxThlNPBSiC4b5veyFz4OFGytU7v/view
[6] https://elinux.org/Device_Tree_Usage#Device_Specific_Data
[7] https://www.devicetree.org/specifications/
[8] https://github.com/rust-vmm/vm-fdt
[9] https://github.com/devicetree-org/dt-schema

Kind regards,
Alexander Gordeev

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin
www.opensynergy.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Potential ways to describe virtio-video device capabilities
  2023-09-22  5:07 Potential ways to describe virtio-video device capabilities Alexander Gordeev
@ 2023-09-22 10:03 ` Laurent Pinchart
  2023-09-25 16:16   ` Alexander Gordeev
  0 siblings, 1 reply; 3+ messages in thread
From: Laurent Pinchart @ 2023-09-22 10:03 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: virtio-dev, linux-media, acourbot, alexlau, daniel, dgreid,
	dstaessens, egranata, fziglio, hverkuil, keiichiw, kraxel,
	marcheu, posciak, spice-devel, stevensd, tfiga, uril, devicetree,
	alex.bennee, aesteve, Matti.Moell, andrew.gazizov, daniel.almeida,
	cohuck, mwojtas, mst, peter.griffin, bag, bgrzesik, hmazur,
	mikrawczyk, srosek

On Fri, Sep 22, 2023 at 07:07:34AM +0200, Alexander Gordeev wrote:
> Hi,
> 
> I'm working on updating virtio-video draft v8 spec [1] and the 
> virtio-video V4L2 driver [2]. One of the things, that I don't like in 
> the current spec draft is sharing the device's capabilities with the 
> guest VM. The main requirement is making these capabilities compatible 
> with V4L2.
> 
> These capabilities could be pretty complex, see [3] and [4]:
> 1. First there could be several coded video formats. Coded formats have 
> their specific sets of supported controls.
> 2. Then for each coded video formats there could be different sets of 
> raw video formats, that could be used in combination with the selected 
> encoded format for decoding/encoding.
> 3. Then for each combination of the coded and raw format there could be 
> different sets of supported resolutions.
> 4. (Optional) for each coded format, raw format and resolution there 
> could be different sets of supported frame rates/intervals.
> 
> In the future new formats, controls, flags, etc could be defined. Right 
> now there is a rather static structure, see section 5.20.7.3.1 
> (VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS) in [5]. It looks too inflexible.
> 
> The V4L2 approach with many different ioctl's and complex querying logic 
> doesn't fit well for virtio-video IMHO. syscall overhead is only a few 
> hundred nanoseconds, so doing tens or hundreds of them is bearable in 
> case of video. But a roundtrip over virtio may take hundreds of 
> microseconds even in the local case. We at OpenSynergy already have 
> setups where the real hardware is accessed over a network. Then it can 
> take a millisecond. People already seem to agree, that this amount of 
> overhead makes V4L2-style discovery process unbearable on a per stream 
> basis. So all the relevant data has to be obtained during the device 
> probing. This means, that in many cases there is a complex structure 
> with all the data on the device side, and we just need to move it to the 
> driver side. Moving it in one step seems easier to me and better because 
> of the latency. Boot time matters too sometimes.

No disagreement here. For what it's worth, I think V4L2 should also
evolve and get a way to query capabilities with a single (or a very
small number of) ioctl.

> One of the ideas is to replace the mentioned 
> VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS command response with a standalone 
> Device Tree Blob. It looks the most promising to me right now. I guess, 
> it may sound crazy to many people, but actually it fits into one of the 
> device tree usage patterns outlined in [6]. This document is referenced 
> in the kernel device tree documentation, so I assume it is correct.

If we want to pass flexible structured data we need a binary format, and
DT provides a binary format. Whether it's the best option or not, I
don't know, but it doesn't seem too crazy to me.

> A simplified version could look like this, for example:
> 
> /dts-v1/;
> 
> / {
>      virtio-video {
>          compatible = "virtio,video";
> 
>          virtio,video-caps {
>              h264 {
>                  profiles-mask = <0x3ffff>;
>                  levels-mask = <0xfffff>;
>                  num-resources-range = <1 32>;
>                  buffer-size = <0x100000>;
>                  bitrates-range = <100000 10000000>;
> 
>                  yuv420 {
>                      plane-layout-mask = <0x3>;
>                      plane-align = <1>;
>                      stride-align-mask = <0x10>;
>                      widths-range-stepwise = <96 1920 8>;
>                      heights-range-stepwise = <96 1080 1>;
>                      num-resources-range = <4 50>;
>                  };
> 
>                  nv12 {
>                      /* ... */
>                  };
> 
>                  rgba {
>                      /* ... */
>                  };
>              };
> 
>              hevc {
>                  /* ... */
>              };
> 
>              vp8 {
>                  /* ... */
>              };
> 
>              vp9 {
>                  /* ... */
>              };
>          };
>      };
> };
> 
> Or maybe the resolutions could be defined separately and linked using 
> phandles to avoid duplication because they are going to depend on the 
> raw formats exclusively in most cases, I think.
> 
> There are many benefits IMO:
> 
> 1. Device tree can be used to describe arbitrary trees (and even 
> arbitrary graphs with phandles). The underlying data structure is 
> generic. It looks like it can fit very well here.
> 2. There is a specification of the format [7]. I hope it could be 
> referenced in the virtio spec, right?
> 3. There is already DTS, DTC, libfdt and DTB parsing code in Linux. All 
> of this can be reused. For example, at the moment we have a custom 
> configuration file format and a big chunk of code to handle it in our 
> virtio-video device. These could be replaced by DTS and calls to libfdt 
> completely, I think. There is also an implementation in Rust [8].

How does libfdt fare when it comes to ease of use and performance ?
License-wise it seems to be dual-licensed under the terms of the GPL v2
and BSD-2, so it should be fine.

> 4. I think the standalone DTB could be integrated into a board DTB later 
> reducing the amount of queries to zero. It is not going to make a big 
> difference in latency though.
> 
> If device trees are used, then we'll need add a binding/schema to the 
> kernel or to the dt-schema repo [9]. Maybe the schema could be 
> referenced in the virtio-video spec instead of duplicating it? This 
> would reduce the spec size.
> 
> I don't know if anybody has already done anything like this and I'm not 
> sure if the device tree maintainers and other involved parties would 
> approve it. That's why I'm starting this thread. Please share your 
> opinions about the idea.
> 
> An alternative to using device trees would be inventing something 
> similar and simpler (without phandles and strings) from scratch. That's 
> fine too, but doesn't allow to reuse the tooling and also is going to 
> make the virtio-video spec even bigger.
> 
> [1] https://lore.kernel.org/virtio-comment/20230629144915.597188-1-Alexander.Gordeev@opensynergy.com/
> [2] https://lore.kernel.org/linux-media/20200218202753.652093-1-dmitry.sepp@opensynergy.com/
> [3] https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html#querying-capabilities
> [4] https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html#querying-capabilities
> [5] https://drive.google.com/file/d/1uPg4kxThlNPBSiC4b5veyFz4OFGytU7v/view
> [6] https://elinux.org/Device_Tree_Usage#Device_Specific_Data
> [7] https://www.devicetree.org/specifications/
> [8] https://github.com/rust-vmm/vm-fdt
> [9] https://github.com/devicetree-org/dt-schema

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Potential ways to describe virtio-video device capabilities
  2023-09-22 10:03 ` Laurent Pinchart
@ 2023-09-25 16:16   ` Alexander Gordeev
  0 siblings, 0 replies; 3+ messages in thread
From: Alexander Gordeev @ 2023-09-25 16:16 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: virtio-dev, linux-media, acourbot, alexlau, daniel, dstaessens,
	egranata, fziglio, hverkuil, keiichiw, kraxel, marcheu, posciak,
	spice-devel, stevensd, tfiga, uril, devicetree, alex.bennee,
	aesteve, Matti.Moell, andrew.gazizov, daniel.almeida, cohuck,
	mwojtas, mst, peter.griffin, bag, bgrzesik, hmazur, mikrawczyk,
	srosek

On 22.09.23 12:03, Laurent Pinchart wrote:
> On Fri, Sep 22, 2023 at 07:07:34AM +0200, Alexander Gordeev wrote:
>> Hi,
>>
>> I'm working on updating virtio-video draft v8 spec [1] and the
>> virtio-video V4L2 driver [2]. One of the things, that I don't like in
>> the current spec draft is sharing the device's capabilities with the
>> guest VM. The main requirement is making these capabilities compatible
>> with V4L2.
>>
>> These capabilities could be pretty complex, see [3] and [4]:
>> 1. First there could be several coded video formats. Coded formats have
>> their specific sets of supported controls.
>> 2. Then for each coded video formats there could be different sets of
>> raw video formats, that could be used in combination with the selected
>> encoded format for decoding/encoding.
>> 3. Then for each combination of the coded and raw format there could be
>> different sets of supported resolutions.
>> 4. (Optional) for each coded format, raw format and resolution there
>> could be different sets of supported frame rates/intervals.
>>
>> In the future new formats, controls, flags, etc could be defined. Right
>> now there is a rather static structure, see section 5.20.7.3.1
>> (VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS) in [5]. It looks too inflexible.
>>
>> The V4L2 approach with many different ioctl's and complex querying logic
>> doesn't fit well for virtio-video IMHO. syscall overhead is only a few
>> hundred nanoseconds, so doing tens or hundreds of them is bearable in
>> case of video. But a roundtrip over virtio may take hundreds of
>> microseconds even in the local case. We at OpenSynergy already have
>> setups where the real hardware is accessed over a network. Then it can
>> take a millisecond. People already seem to agree, that this amount of
>> overhead makes V4L2-style discovery process unbearable on a per stream
>> basis. So all the relevant data has to be obtained during the device
>> probing. This means, that in many cases there is a complex structure
>> with all the data on the device side, and we just need to move it to the
>> driver side. Moving it in one step seems easier to me and better because
>> of the latency. Boot time matters too sometimes.
> 
> No disagreement here. For what it's worth, I think V4L2 should also
> evolve and get a way to query capabilities with a single (or a very
> small number of) ioctl.

That would be great in my opinion. Maybe a little bit different use-case 
ADAIU because the direction is somewhat reversed: with the virtio-video 
the kernel would need to read a DTB, and with normal V4L2 kernel drivers 
it would have to write a DTB. I think writing a DTB in the kernel is 
already possible with the current implementation, but I haven't checked. 
Reading a DTB in user-space is not an issue. I'll take this use-case 
into account in the future.

>> One of the ideas is to replace the mentioned
>> VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS command response with a standalone
>> Device Tree Blob. It looks the most promising to me right now. I guess,
>> it may sound crazy to many people, but actually it fits into one of the
>> device tree usage patterns outlined in [6]. This document is referenced
>> in the kernel device tree documentation, so I assume it is correct.
> 
> If we want to pass flexible structured data we need a binary format, and
> DT provides a binary format. Whether it's the best option or not, I
> don't know, but it doesn't seem too crazy to me.

Good! Well, I'm not sure if DT is the best option certainly. Form my PoV 
having a spec that hopefully could be used as a normative reference is 
essential. The there is a lot of tooling, that I mentioned. There is 
ACPI, but it is too bloated IMO. I know of ALSA topologies and netlink 
protocol specs in YAML, but looks like they're very Linux specific. 
Besides that not sure what else could fit...

>> A simplified version could look like this, for example:
>>
>> /dts-v1/;
>>
>> / {
>>       virtio-video {
>>           compatible = "virtio,video";
>>
>>           virtio,video-caps {
>>               h264 {
>>                   profiles-mask = <0x3ffff>;
>>                   levels-mask = <0xfffff>;
>>                   num-resources-range = <1 32>;
>>                   buffer-size = <0x100000>;
>>                   bitrates-range = <100000 10000000>;
>>
>>                   yuv420 {
>>                       plane-layout-mask = <0x3>;
>>                       plane-align = <1>;
>>                       stride-align-mask = <0x10>;
>>                       widths-range-stepwise = <96 1920 8>;
>>                       heights-range-stepwise = <96 1080 1>;
>>                       num-resources-range = <4 50>;
>>                   };
>>
>>                   nv12 {
>>                       /* ... */
>>                   };
>>
>>                   rgba {
>>                       /* ... */
>>                   };
>>               };
>>
>>               hevc {
>>                   /* ... */
>>               };
>>
>>               vp8 {
>>                   /* ... */
>>               };
>>
>>               vp9 {
>>                   /* ... */
>>               };
>>           };
>>       };
>> };
>>
>> Or maybe the resolutions could be defined separately and linked using
>> phandles to avoid duplication because they are going to depend on the
>> raw formats exclusively in most cases, I think.
>>
>> There are many benefits IMO:
>>
>> 1. Device tree can be used to describe arbitrary trees (and even
>> arbitrary graphs with phandles). The underlying data structure is
>> generic. It looks like it can fit very well here.
>> 2. There is a specification of the format [7]. I hope it could be
>> referenced in the virtio spec, right?
>> 3. There is already DTS, DTC, libfdt and DTB parsing code in Linux. All
>> of this can be reused. For example, at the moment we have a custom
>> configuration file format and a big chunk of code to handle it in our
>> virtio-video device. These could be replaced by DTS and calls to libfdt
>> completely, I think. There is also an implementation in Rust [8].
> 
> How does libfdt fare when it comes to ease of use and performance ?
> License-wise it seems to be dual-licensed under the terms of the GPL v2
> and BSD-2, so it should be fine.

There are some tools in the dtc repository, that use the libfdt. They 
seem relatively simple:
https://git.kernel.org/pub/scm/utils/dtc/dtc.git/tree/fdtget.c
https://git.kernel.org/pub/scm/utils/dtc/dtc.git/tree/fdtput.c
https://git.kernel.org/pub/scm/utils/dtc/dtc.git/tree/fdtdump.c

I haven't tested the performance that much. I tried dtc and the 
mentioned utils with dts file similar to the one above, it was always 
instant. Also the representation of the DT in memory, when parsed in the 
kernel, seems pretty straightforward. I've found the description here: 
https://docs.kernel.org/devicetree/of_unittest.html#adding-the-test-data.
So I guess it should be fine from performance PoV. But maybe we should 
discuss a benchmark.
I'd be happy to learn about other options as well.

>> 4. I think the standalone DTB could be integrated into a board DTB later
>> reducing the amount of queries to zero. It is not going to make a big
>> difference in latency though.
>>
>> If device trees are used, then we'll need add a binding/schema to the
>> kernel or to the dt-schema repo [9]. Maybe the schema could be
>> referenced in the virtio-video spec instead of duplicating it? This
>> would reduce the spec size.
>>
>> I don't know if anybody has already done anything like this and I'm not
>> sure if the device tree maintainers and other involved parties would
>> approve it. That's why I'm starting this thread. Please share your
>> opinions about the idea.
>>
>> An alternative to using device trees would be inventing something
>> similar and simpler (without phandles and strings) from scratch. That's
>> fine too, but doesn't allow to reuse the tooling and also is going to
>> make the virtio-video spec even bigger.
>>
>> [1] https://lore.kernel.org/virtio-comment/20230629144915.597188-1-Alexander.Gordeev@opensynergy.com/
>> [2] https://lore.kernel.org/linux-media/20200218202753.652093-1-dmitry.sepp@opensynergy.com/
>> [3] https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html#querying-capabilities
>> [4] https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html#querying-capabilities
>> [5] https://drive.google.com/file/d/1uPg4kxThlNPBSiC4b5veyFz4OFGytU7v/view
>> [6] https://elinux.org/Device_Tree_Usage#Device_Specific_Data
>> [7] https://www.devicetree.org/specifications/
>> [8] https://github.com/rust-vmm/vm-fdt
>> [9] https://github.com/devicetree-org/dt-schema

-- 
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin
www.opensynergy.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-09-25 16:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-22  5:07 Potential ways to describe virtio-video device capabilities Alexander Gordeev
2023-09-22 10:03 ` Laurent Pinchart
2023-09-25 16:16   ` Alexander Gordeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).