[Media Summit] ChromeOS Kernel CAM

public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed

* [Media Summit] ChromeOS Kernel CAM
@ 2022-09-07  7:55 Ricardo Ribalda
  2022-09-07 10:50 ` Laurent Pinchart
  0 siblings, 1 reply; 20+ messages in thread
From: Ricardo Ribalda @ 2022-09-07  7:55 UTC (permalink / raw)
  To: Linux Media Mailing List, Sakari Ailus, Kieran Bingham,
	Nicolas Dufresne, Benjamin Gaignard, Hidenori Kobayashi,
	Paul Kocialkowski, Michael Olbrich, Laurent Pinchart,
	Ricardo Ribalda, Maxime Ripard, Daniel Scally,
	Jernej Škrabec, Niklas Söderlund, Michael Tretter,
	Hans Verkuil, Philipp Zabel, Mauro Carvalho Chehab,
	Benjamin MUGNIER, Jacopo Mondi, Dave Stevenson

Hi

On ChromeOS we have opted to have a camera stack based on the upstream kernel.

The camera ecosystem has become extremely heterogeneous thanks to the
proliferation of complex cameras. Meaning that, if ChromeOS wants to
keep with our upstream commitments, we have to look into how to get
more involvement from vendors and standardise our stack.

Kcam is an initiative to support complex cameras in a way that can be
scalable, is acceptable by the vendors and respect the users rights.

Slides at: https://drive.google.com/file/d/1Tew21xeKmFlQ7dQxMcIYqybVuQL7La1a/view

Looking forward to see all of you again on Monday :)

-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-07  7:55 [Media Summit] ChromeOS Kernel CAM Ricardo Ribalda
@ 2022-09-07 10:50 ` Laurent Pinchart
  2022-09-08  7:11   ` Ricardo Ribalda
  0 siblings, 1 reply; 20+ messages in thread
From: Laurent Pinchart @ 2022-09-07 10:50 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Linux Media Mailing List, Sakari Ailus, Kieran Bingham,
	Nicolas Dufresne, Benjamin Gaignard, Hidenori Kobayashi,
	Paul Kocialkowski, Michael Olbrich, Maxime Ripard, Daniel Scally,
	Jernej Škrabec, Niklas Söderlund, Michael Tretter,
	Hans Verkuil, Philipp Zabel, Mauro Carvalho Chehab,
	Benjamin MUGNIER, Jacopo Mondi, Dave Stevenson

Hi Ricardo,

On Wed, Sep 07, 2022 at 09:55:12AM +0200, Ricardo Ribalda wrote:
> Hi
> 
> On ChromeOS we have opted to have a camera stack based on the upstream kernel.
> 
> The camera ecosystem has become extremely heterogeneous thanks to the
> proliferation of complex cameras. Meaning that, if ChromeOS wants to
> keep with our upstream commitments, we have to look into how to get
> more involvement from vendors and standardise our stack.
> 
> Kcam is an initiative to support complex cameras in a way that can be
> scalable, is acceptable by the vendors and respect the users rights.
> 
> Slides at: https://drive.google.com/file/d/1Tew21xeKmFlQ7dQxMcIYqybVuQL7La1a/view

Thank you. A few questions and comments for clarification:

- Slide 4 mentions proprietary drivers and UIO drivers. Do you mean UIO
  as in the upstream UIO API, or as in UIO-like drivers with a vendor
  API ?

- Slide 5 mentions "Code developed exclusively by vendor" for Android.
  There's the CameraX initiative (and possibly other I'm not aware of)
  that mixes the high-level HAL implementation from Google with
  low-level vendor code, to simplify (in theory at least) the life of
  vendors. Generally speaking you're right though, the vendor is in
  charge of providing the HAL, regardless of how it's structured
  internally.

- Slide 8 is focussed on notebooks (Chrome OS, but I suppose also
  regular Linux machines) vs. Android when it comes to leveraging the
  camera stack, but let's not forget there are also other markets (IoT
  in particular) that may be structured differently. Not all vendors of
  SoCs that integrate ISPs consider Android as their main target, and
  they may ignore the notebook and mobile markets completely.

- Slide 11 (and previous slides too) mention "Secret Sauce". I really
  dislike that term, as it's very vague. I would like discussions to
  clearly define the scope of that closed-source component, and we
  should come up with a more descriptive name that reflects that
  well-defined scope.

- Slide 16 mentions 122 ioctls to emphasize that V4L2 is a complicated
  API. Most of those are not relevant to cameras. It is thus a bit
  misleading technically, but it can be still perceived as complicated
  by vendors for that reason.

- Still on slide 16, V4L2 as an API is usable without disclosing vendor
  IP. What is not possible is upstreaming a driver. I don't see this as
  significantly different between V4L2 and the new API proposal. I
  expect this to be discussed on Monday.

- On slide 17 the color scheme seems to imply that the daemon is
  open-source, while it's in most cases (maybe in all of them) closed.

- Do you have a real life example of the type of outcome described on
  slide 19 (black box hardware) ?

- Slide 24 mentions parameter buffers, it would be useful to describe
  what those typically contain, and who consumes them once they're
  provided by userspace to the driver.

- Slide 27 mentions that upstreaming a driver will require a camera
  stack with the same open source requirements as V4L2. Doesn't that
  contradict slide 16 that mentions that V4L2 cannot product vendor IP,
  or at least infer that the new API wouldn't protect the vendor IP more
  than V4L2 does ?

- Slide 31 mentions that entities can send operations internally and
  listen to each other events. I'd like to better understand how that
  will work without any abstraction in the API (as that is one of the
  main design decision behind this new API) when those entities are from
  different vendors, and handled by different drivers that are developed
  independently (for instance, the camera sensor and the CSI-2 receiver,
  or even the CSI-2 receiver and the ISP).

- Does the bike on slide 32 illustrate the difficult discussions we've
  had in the past and how progress was hindered ? :-)

> Looking forward to see all of you again on Monday :)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-07 10:50 ` Laurent Pinchart
@ 2022-09-08  7:11   ` Ricardo Ribalda
  2022-09-08  8:08     ` Maxime Ripard
  0 siblings, 1 reply; 20+ messages in thread
From: Ricardo Ribalda @ 2022-09-08  7:11 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Linux Media Mailing List, Sakari Ailus, Kieran Bingham,
	Nicolas Dufresne, Benjamin Gaignard, Hidenori Kobayashi,
	Paul Kocialkowski, Michael Olbrich, Maxime Ripard, Daniel Scally,
	Jernej Škrabec, Niklas Söderlund, Michael Tretter,
	Hans Verkuil, Philipp Zabel, Mauro Carvalho Chehab,
	Benjamin MUGNIER, Jacopo Mondi, Dave Stevenson

Hi Laurent

On Wed, 7 Sept 2022 at 12:51, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Ricardo,
>
> On Wed, Sep 07, 2022 at 09:55:12AM +0200, Ricardo Ribalda wrote:
> > Hi
> >
> > On ChromeOS we have opted to have a camera stack based on the upstream kernel.
> >
> > The camera ecosystem has become extremely heterogeneous thanks to the
> > proliferation of complex cameras. Meaning that, if ChromeOS wants to
> > keep with our upstream commitments, we have to look into how to get
> > more involvement from vendors and standardise our stack.
> >
> > Kcam is an initiative to support complex cameras in a way that can be
> > scalable, is acceptable by the vendors and respect the users rights.
> >
> > Slides at: https://drive.google.com/file/d/1Tew21xeKmFlQ7dQxMcIYqybVuQL7La1a/view
>
> Thank you. A few questions and comments for clarification:
>
> - Slide 4 mentions proprietary drivers and UIO drivers. Do you mean UIO
>   as in the upstream UIO API, or as in UIO-like drivers with a vendor
>   API ?

It is really a jungle. You get UIO-like, real UIO (less common), franken V4L2...

>
> - Slide 5 mentions "Code developed exclusively by vendor" for Android.
>   There's the CameraX initiative (and possibly other I'm not aware of)
>   that mixes the high-level HAL implementation from Google with
>   low-level vendor code, to simplify (in theory at least) the life of
>   vendors. Generally speaking you're right though, the vendor is in
>   charge of providing the HAL, regardless of how it's structured
>   internally.
>
> - Slide 8 is focussed on notebooks (Chrome OS, but I suppose also
>   regular Linux machines) vs. Android when it comes to leveraging the
>   camera stack, but let's not forget there are also other markets (IoT
>   in particular) that may be structured differently. Not all vendors of
>   SoCs that integrate ISPs consider Android as their main target, and
>   they may ignore the notebook and mobile markets completely.


And also not forget about Industry 3.0.
There is a lot of diversity there.

>
> - Slide 11 (and previous slides too) mention "Secret Sauce". I really
>   dislike that term, as it's very vague. I would like discussions to
>   clearly define the scope of that closed-source component, and we
>   should come up with a more descriptive name that reflects that
>   well-defined scope.

I came up with: Closed-loop IQ algorithms. But it is less catchy than
Secret Sauce.

>
> - Slide 16 mentions 122 ioctls to emphasize that V4L2 is a complicated
>   API. Most of those are not relevant to cameras. It is thus a bit
>   misleading technically, but it can be still perceived as complicated
>   by vendors for that reason.

Vivid uses 100... But I agree, it is not the number of ioctls that
makes it complicated.

>
> - Still on slide 16, V4L2 as an API is usable without disclosing vendor
>   IP. What is not possible is upstreaming a driver. I don't see this as
>   significantly different between V4L2 and the new API proposal. I
>   expect this to be discussed on Monday.

I am only considering upstream drivers. There is not much to discuss
for downstream or closed drivers :)

>
> - On slide 17 the color scheme seems to imply that the daemon is
>   open-source, while it's in most cases (maybe in all of them) closed.
>

We get a bit of everything, not only v4l2. So this is why it is
purple: blue + red ;)


> - Do you have a real life example of the type of outcome described on
>   slide 19 (black box hardware) ?

Yes, I am working with the vendor to know how much I can disclose about it.

>
> - Slide 24 mentions parameter buffers, it would be useful to describe
>   what those typically contain, and who consumes them once they're
>   provided by userspace to the driver.

>
> - Slide 27 mentions that upstreaming a driver will require a camera
>   stack with the same open source requirements as V4L2. Doesn't that
>   contradict slide 16 that mentions that V4L2 cannot product vendor IP,
>   or at least infer that the new API wouldn't protect the vendor IP more
>   than V4L2 does ?

Let's discuss that on Monday,

>
> - Slide 31 mentions that entities can send operations internally and
>   listen to each other events. I'd like to better understand how that
>   will work without any abstraction in the API (as that is one of the
>   main design decision behind this new API) when those entities are from
>   different vendors, and handled by different drivers that are developed
>   independently (for instance, the camera sensor and the CSI-2 receiver,
>   or even the CSI-2 receiver and the ISP).

It is still under work.

Hardware, specially for standard buses,  should be resilient (not
crash) to format mismatches. Otherwise a mal-functionling sensor or
too much noise could crash the system (with or without kcam).

Drivers developed together should know about the rest of the system,
so that is not the issue here.

For drivers developed by different vendors for a standard bus, on
hardware that is not resilient (that was a mouthful), then we need to
prepare a set of read-only standard registers.


>
> - Does the bike on slide 32 illustrate the difficult discussions we've
>   had in the past and how progress was hindered ? :-)

This is how we do code review at Google when two developers do not
want to work together. We take the bike to the rooftop and the two
developers that disagree tries to push the other developer to the edge
of the building.

The first second, when you see your colleague falling you think that
you have won.... then you realise that you are falling with them.

(you asked for a metaphor :P )

>
> > Looking forward to see all of you again on Monday :)
>
> --
> Regards,
>
> Laurent Pinchart



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08  7:11   ` Ricardo Ribalda
@ 2022-09-08  8:08     ` Maxime Ripard
  2022-09-08 14:14       ` Laurent Pinchart
  0 siblings, 1 reply; 20+ messages in thread
From: Maxime Ripard @ 2022-09-08  8:08 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

[-- Attachment #1: Type: text/plain, Size: 3014 bytes --]

Hi Ricardo,

On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> >   IP. What is not possible is upstreaming a driver. I don't see this as
> >   significantly different between V4L2 and the new API proposal. I
> >   expect this to be discussed on Monday.
> 
> I am only considering upstream drivers. There is not much to discuss
> for downstream or closed drivers :)

Are we really discussing upstream *drivers*? If anything, it looks like
the Kcam proposal moves most of the drivers out of upstream.

> > - Slide 31 mentions that entities can send operations internally and
> >   listen to each other events. I'd like to better understand how that
> >   will work without any abstraction in the API (as that is one of the
> >   main design decision behind this new API) when those entities are from
> >   different vendors, and handled by different drivers that are developed
> >   independently (for instance, the camera sensor and the CSI-2 receiver,
> >   or even the CSI-2 receiver and the ISP).
> 
> It is still under work.
> 
> Hardware, specially for standard buses,  should be resilient (not
> crash) to format mismatches. Otherwise a mal-functionling sensor or
> too much noise could crash the system (with or without kcam).
> 
> Drivers developed together should know about the rest of the system,
> so that is not the issue here.
> 
> For drivers developed by different vendors for a standard bus, on
> hardware that is not resilient (that was a mouthful), then we need to
> prepare a set of read-only standard registers.

I'm not even sure that read-only registers would be enough. I've
experienced first-hand DMA controllers that, when the camera has its
timings completely off, end up completely confused and write way outside
of its assigned buffer creating big chunks of corrupted memory in the
system.

And that was by writing fairly legit values to registers that were meant
for that, so we wouldn't be able to defend against it even with the
smartest whitelist.

And we were in a "good faith" situation. Giving an attacker basically
programmable access to DMA engines that might not be sitting behind an
IOMMU seems like a very dangerous idea to me.

> > - Does the bike on slide 32 illustrate the difficult discussions we've
> >   had in the past and how progress was hindered ? :-)
> 
> This is how we do code review at Google when two developers do not
> want to work together. We take the bike to the rooftop and the two
> developers that disagree tries to push the other developer to the edge
> of the building.
> 
> The first second, when you see your colleague falling you think that
> you have won.... then you realise that you are falling with them.

So the optimal solution would be that both stop pushing, or push the
other just as hard without bulging? That doesn't seem like a good way to
end up with a compromise ;)

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08  8:08     ` Maxime Ripard
@ 2022-09-08 14:14       ` Laurent Pinchart
  2022-09-08 14:59         ` Maxime Ripard
  0 siblings, 1 reply; 20+ messages in thread
From: Laurent Pinchart @ 2022-09-08 14:14 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Ricardo Ribalda, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> Hi Ricardo,
> 
> On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > >   significantly different between V4L2 and the new API proposal. I
> > >   expect this to be discussed on Monday.
> > 
> > I am only considering upstream drivers. There is not much to discuss
> > for downstream or closed drivers :)
> 
> Are we really discussing upstream *drivers*? If anything, it looks like
> the Kcam proposal moves most of the drivers out of upstream.

Given that the API proposal sets at a significant lower level than V4L2
in the stack, the concept of "userspace driver" (I meant it in the sense
of GPU support in mesa) plays a bigger role. It would be good to clarify
what is meant by "driver" and maybe use the term "kernel driver" when
only the kernel part is covered, to avoid misunderstandings.

> > > - Slide 31 mentions that entities can send operations internally and
> > >   listen to each other events. I'd like to better understand how that
> > >   will work without any abstraction in the API (as that is one of the
> > >   main design decision behind this new API) when those entities are from
> > >   different vendors, and handled by different drivers that are developed
> > >   independently (for instance, the camera sensor and the CSI-2 receiver,
> > >   or even the CSI-2 receiver and the ISP).
> > 
> > It is still under work.
> > 
> > Hardware, specially for standard buses,  should be resilient (not
> > crash) to format mismatches. Otherwise a mal-functionling sensor or
> > too much noise could crash the system (with or without kcam).
> > 
> > Drivers developed together should know about the rest of the system,
> > so that is not the issue here.
> > 
> > For drivers developed by different vendors for a standard bus, on
> > hardware that is not resilient (that was a mouthful), then we need to
> > prepare a set of read-only standard registers.
> 
> I'm not even sure that read-only registers would be enough. I've
> experienced first-hand DMA controllers that, when the camera has its
> timings completely off, end up completely confused and write way outside
> of its assigned buffer creating big chunks of corrupted memory in the
> system.
> 
> And that was by writing fairly legit values to registers that were meant
> for that, so we wouldn't be able to defend against it even with the
> smartest whitelist.
> 
> And we were in a "good faith" situation. Giving an attacker basically
> programmable access to DMA engines that might not be sitting behind an
> IOMMU seems like a very dangerous idea to me.

Do we need to preassign a range of CVE numbers ? :-)

> > > - Does the bike on slide 32 illustrate the difficult discussions we've
> > >   had in the past and how progress was hindered ? :-)
> > 
> > This is how we do code review at Google when two developers do not
> > want to work together. We take the bike to the rooftop and the two
> > developers that disagree tries to push the other developer to the edge
> > of the building.
> > 
> > The first second, when you see your colleague falling you think that
> > you have won.... then you realise that you are falling with them.
> 
> So the optimal solution would be that both stop pushing, or push the
> other just as hard without bulging? That doesn't seem like a good way to
> end up with a compromise ;)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 14:14       ` Laurent Pinchart
@ 2022-09-08 14:59         ` Maxime Ripard
  2022-09-08 15:16           ` Laurent Pinchart
  0 siblings, 1 reply; 20+ messages in thread
From: Maxime Ripard @ 2022-09-08 14:59 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Ricardo Ribalda, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

[-- Attachment #1: Type: text/plain, Size: 3794 bytes --]

On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > Hi Ricardo,
> > 
> > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > >   significantly different between V4L2 and the new API proposal. I
> > > >   expect this to be discussed on Monday.
> > > 
> > > I am only considering upstream drivers. There is not much to discuss
> > > for downstream or closed drivers :)
> > 
> > Are we really discussing upstream *drivers*? If anything, it looks like
> > the Kcam proposal moves most of the drivers out of upstream.
> 
> Given that the API proposal sets at a significant lower level than V4L2
> in the stack, the concept of "userspace driver" (I meant it in the sense
> of GPU support in mesa) plays a bigger role. It would be good to clarify
> what is meant by "driver" and maybe use the term "kernel driver" when
> only the kernel part is covered, to avoid misunderstandings.

I think there's a bit of a misunderstanding about what exactly is in a
DRM driver, and what is in Mesa.

Mesa doesn't program the hardware at all, it's merely a glorified
compiler. It's not more of a driver than GCC is an OS. Most importantly
for our discussion, Mesa doesn't perform any kind of register access (or
register access request), only the (kernel) driver does that.

What would be relevant to the discussion though was the userspace mode
setting, where X11 would have most of the logic to drive the hardware
directly.

That ended up being a mistake, and got superseded by KMS more than a
decade ago because it wasn't working.

> > > > - Slide 31 mentions that entities can send operations internally and
> > > >   listen to each other events. I'd like to better understand how that
> > > >   will work without any abstraction in the API (as that is one of the
> > > >   main design decision behind this new API) when those entities are from
> > > >   different vendors, and handled by different drivers that are developed
> > > >   independently (for instance, the camera sensor and the CSI-2 receiver,
> > > >   or even the CSI-2 receiver and the ISP).
> > > 
> > > It is still under work.
> > > 
> > > Hardware, specially for standard buses,  should be resilient (not
> > > crash) to format mismatches. Otherwise a mal-functionling sensor or
> > > too much noise could crash the system (with or without kcam).
> > > 
> > > Drivers developed together should know about the rest of the system,
> > > so that is not the issue here.
> > > 
> > > For drivers developed by different vendors for a standard bus, on
> > > hardware that is not resilient (that was a mouthful), then we need to
> > > prepare a set of read-only standard registers.
> > 
> > I'm not even sure that read-only registers would be enough. I've
> > experienced first-hand DMA controllers that, when the camera has its
> > timings completely off, end up completely confused and write way outside
> > of its assigned buffer creating big chunks of corrupted memory in the
> > system.
> > 
> > And that was by writing fairly legit values to registers that were meant
> > for that, so we wouldn't be able to defend against it even with the
> > smartest whitelist.
> > 
> > And we were in a "good faith" situation. Giving an attacker basically
> > programmable access to DMA engines that might not be sitting behind an
> > IOMMU seems like a very dangerous idea to me.
> 
> Do we need to preassign a range of CVE numbers ? :-)

We can do that, but I'd rather have some way to defend against that.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 14:59         ` Maxime Ripard
@ 2022-09-08 15:16           ` Laurent Pinchart
  2022-09-08 15:34             ` Maxime Ripard
  0 siblings, 1 reply; 20+ messages in thread
From: Laurent Pinchart @ 2022-09-08 15:16 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Ricardo Ribalda, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > Hi Ricardo,
> > > 
> > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > >   significantly different between V4L2 and the new API proposal. I
> > > > >   expect this to be discussed on Monday.
> > > > 
> > > > I am only considering upstream drivers. There is not much to discuss
> > > > for downstream or closed drivers :)
> > > 
> > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > the Kcam proposal moves most of the drivers out of upstream.
> > 
> > Given that the API proposal sets at a significant lower level than V4L2
> > in the stack, the concept of "userspace driver" (I meant it in the sense
> > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > what is meant by "driver" and maybe use the term "kernel driver" when
> > only the kernel part is covered, to avoid misunderstandings.
> 
> I think there's a bit of a misunderstanding about what exactly is in a
> DRM driver, and what is in Mesa.
> 
> Mesa doesn't program the hardware at all, it's merely a glorified
> compiler. It's not more of a driver than GCC is an OS. Most importantly
> for our discussion, Mesa doesn't perform any kind of register access (or
> register access request), only the (kernel) driver does that.

Mesa compiles shaders, but also more generally produces command streams
that are passed as blobs to the DRM driver, which then forwards them to
the device with as little processing and validation as possible (when
the device is designed with multi-clients in mind, that processing and
validation can be reduced a lot). Recent ISPs have a similar
architecture, with a set of registers used to communicate with the ISP
firmware, and then most of the hardware registers for the actual image
processing blocks being programmed based from the command stream.
"Command stream" may not be a very good term for ISPs as it's not really
a stream of commands, but conceptually, we're dealing with a blob that
is computed by userspace.

> What would be relevant to the discussion though was the userspace mode
> setting, where X11 would have most of the logic to drive the hardware
> directly.
> 
> That ended up being a mistake, and got superseded by KMS more than a
> decade ago because it wasn't working.

You're absolutely right. I focussed my analysis of the API proposal on
the ISP parameters, but there's more than that, the plan (as I
understand it) is also to handle the programming of the registers not
related to the image processing as such from userspace. I've used the
DRM <-> KMS analogy before, to point out that the graphics world has an
abstract model on the KMS side for pipeline configuration and uses the
lower-level DRM API only to pass command streams, but forgot to mention
X11 UMS. It's certainly a very good point.

> > > > > - Slide 31 mentions that entities can send operations internally and
> > > > >   listen to each other events. I'd like to better understand how that
> > > > >   will work without any abstraction in the API (as that is one of the
> > > > >   main design decision behind this new API) when those entities are from
> > > > >   different vendors, and handled by different drivers that are developed
> > > > >   independently (for instance, the camera sensor and the CSI-2 receiver,
> > > > >   or even the CSI-2 receiver and the ISP).
> > > > 
> > > > It is still under work.
> > > > 
> > > > Hardware, specially for standard buses,  should be resilient (not
> > > > crash) to format mismatches. Otherwise a mal-functionling sensor or
> > > > too much noise could crash the system (with or without kcam).
> > > > 
> > > > Drivers developed together should know about the rest of the system,
> > > > so that is not the issue here.
> > > > 
> > > > For drivers developed by different vendors for a standard bus, on
> > > > hardware that is not resilient (that was a mouthful), then we need to
> > > > prepare a set of read-only standard registers.
> > > 
> > > I'm not even sure that read-only registers would be enough. I've
> > > experienced first-hand DMA controllers that, when the camera has its
> > > timings completely off, end up completely confused and write way outside
> > > of its assigned buffer creating big chunks of corrupted memory in the
> > > system.
> > > 
> > > And that was by writing fairly legit values to registers that were meant
> > > for that, so we wouldn't be able to defend against it even with the
> > > smartest whitelist.
> > > 
> > > And we were in a "good faith" situation. Giving an attacker basically
> > > programmable access to DMA engines that might not be sitting behind an
> > > IOMMU seems like a very dangerous idea to me.
> > 
> > Do we need to preassign a range of CVE numbers ? :-)
> 
> We can do that, but I'd rather have some way to defend against that.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 15:16           ` Laurent Pinchart
@ 2022-09-08 15:34             ` Maxime Ripard
  2022-09-08 18:13               ` Ricardo Ribalda
  0 siblings, 1 reply; 20+ messages in thread
From: Maxime Ripard @ 2022-09-08 15:34 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Ricardo Ribalda, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

[-- Attachment #1: Type: text/plain, Size: 3121 bytes --]

On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > Hi Ricardo,
> > > > 
> > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > >   expect this to be discussed on Monday.
> > > > > 
> > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > for downstream or closed drivers :)
> > > > 
> > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > the Kcam proposal moves most of the drivers out of upstream.
> > > 
> > > Given that the API proposal sets at a significant lower level than V4L2
> > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > only the kernel part is covered, to avoid misunderstandings.
> > 
> > I think there's a bit of a misunderstanding about what exactly is in a
> > DRM driver, and what is in Mesa.
> > 
> > Mesa doesn't program the hardware at all, it's merely a glorified
> > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > for our discussion, Mesa doesn't perform any kind of register access (or
> > register access request), only the (kernel) driver does that.
> 
> Mesa compiles shaders, but also more generally produces command streams
> that are passed as blobs to the DRM driver, which then forwards them to
> the device with as little processing and validation as possible (when
> the device is designed with multi-clients in mind, that processing and
> validation can be reduced a lot).

That's true, but at no point in time is the CPU ever touches that
command stream blob in the case of DRM...

> Recent ISPs have a similar architecture, with a set of registers used
> to communicate with the ISP firmware, and then most of the hardware
> registers for the actual image processing blocks being programmed
> based from the command stream. "Command stream" may not be a very good
> term for ISPs as it's not really a stream of commands, but
> conceptually, we're dealing with a blob that is computed by userspace.

... while in Kcam, the CPU knows and will interpret that command stream.
Maybe not in all cases, but it's still a significant difference.

If we had to draw a parallel with something else in the kernel, it looks
way more like eBPF or the discussion we had on where to parse the
bitstreams for stateless codecs.

The first one has been severely constrained to avoid the issues we've
raised, and we all know how the second one went.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 15:34             ` Maxime Ripard
@ 2022-09-08 18:13               ` Ricardo Ribalda
  2022-09-08 18:13                 ` Ricardo Ribalda
  2022-09-09  8:00                 ` Maxime Ripard
  0 siblings, 2 replies; 20+ messages in thread
From: Ricardo Ribalda @ 2022-09-08 18:13 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

On Thu, 8 Sept 2022 at 17:34, Maxime Ripard <maxime@cerno.tech> wrote:
>
> On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > Hi Ricardo,
> > > > >
> > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > >   expect this to be discussed on Monday.
> > > > > >
> > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > for downstream or closed drivers :)
> > > > >
> > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > >
> > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > only the kernel part is covered, to avoid misunderstandings.
> > >
> > > I think there's a bit of a misunderstanding about what exactly is in a
> > > DRM driver, and what is in Mesa.
> > >
> > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > register access request), only the (kernel) driver does that.
> >
> > Mesa compiles shaders, but also more generally produces command streams
> > that are passed as blobs to the DRM driver, which then forwards them to
> > the device with as little processing and validation as possible (when
> > the device is designed with multi-clients in mind, that processing and
> > validation can be reduced a lot).
>
> That's true, but at no point in time is the CPU ever touches that
> command stream blob in the case of DRM...

As Laurent says, the latest hardware is very similar to GPUs, you pass
a set of commands to a firmware that does the actual R/W to the
hardware.

For hardware that is a register set, the vendor should have a good
idea about what kind of validation should be needed: raw access (deny
list) or more abstracted (allow list).

The most critical part is the DMA, and that will always be abstracted.
Also I doubt that we will have new hardware without an IOMMU, so we
have the same layers of security as today.

>
> > Recent ISPs have a similar architecture, with a set of registers used
> > to communicate with the ISP firmware, and then most of the hardware
> > registers for the actual image processing blocks being programmed
> > based from the command stream. "Command stream" may not be a very good
> > term for ISPs as it's not really a stream of commands, but
> > conceptually, we're dealing with a blob that is computed by userspace.
>
> ... while in Kcam, the CPU knows and will interpret that command stream.
> Maybe not in all cases, but it's still a significant difference.
>
> If we had to draw a parallel with something else in the kernel, it looks
> way more like eBPF or the discussion we had on where to parse the
> bitstreams for stateless codecs.
>
> The first one has been severely constrained to avoid the issues we've
> raised, and we all know how the second one went.

In eBPF, you are moving some user code to the kernel, with an unstable API.

In KCAM, (and in DRM), you let the user build a set of operations,
that you pass to the kernel via a stable API, then it is validated and
scheduled by the kernel.

X11 was much more bizarre, the GPIO iomem was remapped into userspace.




--
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 18:13               ` Ricardo Ribalda
@ 2022-09-08 18:13                 ` Ricardo Ribalda
  2022-09-08 19:30                   ` Laurent Pinchart
  2022-09-09  8:00                 ` Maxime Ripard
  1 sibling, 1 reply; 20+ messages in thread
From: Ricardo Ribalda @ 2022-09-08 18:13 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

On Thu, 8 Sept 2022 at 20:13, Ricardo Ribalda <ribalda@chromium.org> wrote:
>
> On Thu, 8 Sept 2022 at 17:34, Maxime Ripard <maxime@cerno.tech> wrote:
> >
> > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > Hi Ricardo,
> > > > > >
> > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > >   expect this to be discussed on Monday.
> > > > > > >
> > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > for downstream or closed drivers :)
> > > > > >
> > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > >
> > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > only the kernel part is covered, to avoid misunderstandings.
> > > >
> > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > DRM driver, and what is in Mesa.
> > > >
> > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > register access request), only the (kernel) driver does that.
> > >
> > > Mesa compiles shaders, but also more generally produces command streams
> > > that are passed as blobs to the DRM driver, which then forwards them to
> > > the device with as little processing and validation as possible (when
> > > the device is designed with multi-clients in mind, that processing and
> > > validation can be reduced a lot).
> >
> > That's true, but at no point in time is the CPU ever touches that
> > command stream blob in the case of DRM...
>
> As Laurent says, the latest hardware is very similar to GPUs, you pass
> a set of commands to a firmware that does the actual R/W to the
> hardware.
>
> For hardware that is a register set, the vendor should have a good
> idea about what kind of validation should be needed: raw access (deny
> list) or more abstracted (allow list).
>
> The most critical part is the DMA, and that will always be abstracted.
> Also I doubt that we will have new hardware without an IOMMU, so we
> have the same layers of security as today.
>
> >
> > > Recent ISPs have a similar architecture, with a set of registers used
> > > to communicate with the ISP firmware, and then most of the hardware
> > > registers for the actual image processing blocks being programmed
> > > based from the command stream. "Command stream" may not be a very good
> > > term for ISPs as it's not really a stream of commands, but
> > > conceptually, we're dealing with a blob that is computed by userspace.
> >
> > ... while in Kcam, the CPU knows and will interpret that command stream.
> > Maybe not in all cases, but it's still a significant difference.
> >
> > If we had to draw a parallel with something else in the kernel, it looks
> > way more like eBPF or the discussion we had on where to parse the
> > bitstreams for stateless codecs.
> >
> > The first one has been severely constrained to avoid the issues we've
> > raised, and we all know how the second one went.
>
> In eBPF, you are moving some user code to the kernel, with an unstable API.
>
> In KCAM, (and in DRM), you let the user build a set of operations,
> that you pass to the kernel via a stable API, then it is validated and
> scheduled by the kernel.
>
> X11 was much more bizarre, the GPIO iomem was remapped into userspace.

s/GPIO/GPU/ ;)
>
>
>
>
> --
> Ricardo Ribalda



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 18:13                 ` Ricardo Ribalda
@ 2022-09-08 19:30                   ` Laurent Pinchart
  2022-09-08 20:04                     ` Ricardo Ribalda
  0 siblings, 1 reply; 20+ messages in thread
From: Laurent Pinchart @ 2022-09-08 19:30 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Maxime Ripard, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

Hi Ricardo,

On Thu, Sep 08, 2022 at 08:13:57PM +0200, Ricardo Ribalda wrote:
> On Thu, 8 Sept 2022 at 20:13, Ricardo Ribalda wrote:
> > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard wrote:
> > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > Hi Ricardo,
> > > > > > >
> > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > >
> > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > for downstream or closed drivers :)
> > > > > > >
> > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > >
> > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > >
> > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > DRM driver, and what is in Mesa.
> > > > >
> > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > register access request), only the (kernel) driver does that.
> > > >
> > > > Mesa compiles shaders, but also more generally produces command streams
> > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > the device with as little processing and validation as possible (when
> > > > the device is designed with multi-clients in mind, that processing and
> > > > validation can be reduced a lot).
> > >
> > > That's true, but at no point in time is the CPU ever touches that
> > > command stream blob in the case of DRM...
> >
> > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > a set of commands to a firmware that does the actual R/W to the
> > hardware.

*Some* of the latest hardware. There are new SoCs getting to the market
today with GPUs that are fully programmed from the kernel, and even more
that are fully programmable from the kernel even if the stack provided
by the SoC vendor has a firmware that takes care of programming the ISP.

One thing that isn't clear in your proposal is where the line is drawn.
Passing a blob to the ISP firmware involves some kind of communication
mechanism, which ultimately deals with hardware registers somewhere.
It's not clear if those registers are part of the blob that userspace
passes to the kernel. I'd assume not, but clarifying where the line is
would be useful.

> > For hardware that is a register set, the vendor should have a good
> > idea about what kind of validation should be needed: raw access (deny
> > list) or more abstracted (allow list).
> >
> > The most critical part is the DMA, and that will always be abstracted.
> > Also I doubt that we will have new hardware without an IOMMU, so we
> > have the same layers of security as today.

I know of SoCs in the making that have ISPs and no IOMMU.

> > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > to communicate with the ISP firmware, and then most of the hardware
> > > > registers for the actual image processing blocks being programmed
> > > > based from the command stream. "Command stream" may not be a very good
> > > > term for ISPs as it's not really a stream of commands, but
> > > > conceptually, we're dealing with a blob that is computed by userspace.
> > >
> > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > Maybe not in all cases, but it's still a significant difference.
> > >
> > > If we had to draw a parallel with something else in the kernel, it looks
> > > way more like eBPF or the discussion we had on where to parse the
> > > bitstreams for stateless codecs.
> > >
> > > The first one has been severely constrained to avoid the issues we've
> > > raised, and we all know how the second one went.
> >
> > In eBPF, you are moving some user code to the kernel, with an unstable API.
> >
> > In KCAM, (and in DRM), you let the user build a set of operations,
> > that you pass to the kernel via a stable API, then it is validated and
> > scheduled by the kernel.
> >
> > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
> 
> s/GPIO/GPU/ ;)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 19:30                   ` Laurent Pinchart
@ 2022-09-08 20:04                     ` Ricardo Ribalda
  2022-09-08 20:59                       ` Laurent Pinchart
  0 siblings, 1 reply; 20+ messages in thread
From: Ricardo Ribalda @ 2022-09-08 20:04 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Maxime Ripard, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

On Thu, 8 Sept 2022 at 21:31, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Ricardo,
>
> On Thu, Sep 08, 2022 at 08:13:57PM +0200, Ricardo Ribalda wrote:
> > On Thu, 8 Sept 2022 at 20:13, Ricardo Ribalda wrote:
> > > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard wrote:
> > > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > > Hi Ricardo,
> > > > > > > >
> > > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > > >
> > > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > > for downstream or closed drivers :)
> > > > > > > >
> > > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > > >
> > > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > > >
> > > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > > DRM driver, and what is in Mesa.
> > > > > >
> > > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > > register access request), only the (kernel) driver does that.
> > > > >
> > > > > Mesa compiles shaders, but also more generally produces command streams
> > > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > > the device with as little processing and validation as possible (when
> > > > > the device is designed with multi-clients in mind, that processing and
> > > > > validation can be reduced a lot).
> > > >
> > > > That's true, but at no point in time is the CPU ever touches that
> > > > command stream blob in the case of DRM...
> > >
> > > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > > a set of commands to a firmware that does the actual R/W to the
> > > hardware.
>
> *Some* of the latest hardware. There are new SoCs getting to the market
> today with GPUs that are fully programmed from the kernel, and even more
> that are fully programmable from the kernel even if the stack provided
> by the SoC vendor has a firmware that takes care of programming the ISP.
>
> One thing that isn't clear in your proposal is where the line is drawn.
> Passing a blob to the ISP firmware involves some kind of communication
> mechanism, which ultimately deals with hardware registers somewhere.
> It's not clear if those registers are part of the blob that userspace
> passes to the kernel. I'd assume not, but clarifying where the line is
> would be useful.
>
> > > For hardware that is a register set, the vendor should have a good
> > > idea about what kind of validation should be needed: raw access (deny
> > > list) or more abstracted (allow list).
> > >
> > > The most critical part is the DMA, and that will always be abstracted.
> > > Also I doubt that we will have new hardware without an IOMMU, so we
> > > have the same layers of security as today.
>
> I know of SoCs in the making that have ISPs and no IOMMU.

I guess this is why you meant with reserving CVE ranges :)


>
> > > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > > to communicate with the ISP firmware, and then most of the hardware
> > > > > registers for the actual image processing blocks being programmed
> > > > > based from the command stream. "Command stream" may not be a very good
> > > > > term for ISPs as it's not really a stream of commands, but
> > > > > conceptually, we're dealing with a blob that is computed by userspace.
> > > >
> > > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > > Maybe not in all cases, but it's still a significant difference.
> > > >
> > > > If we had to draw a parallel with something else in the kernel, it looks
> > > > way more like eBPF or the discussion we had on where to parse the
> > > > bitstreams for stateless codecs.
> > > >
> > > > The first one has been severely constrained to avoid the issues we've
> > > > raised, and we all know how the second one went.
> > >
> > > In eBPF, you are moving some user code to the kernel, with an unstable API.
> > >
> > > In KCAM, (and in DRM), you let the user build a set of operations,
> > > that you pass to the kernel via a stable API, then it is validated and
> > > scheduled by the kernel.
> > >
> > > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
> >
> > s/GPIO/GPU/ ;)
>
> --
> Regards,
>
> Laurent Pinchart

Looking forward to Monday :)

-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 20:04                     ` Ricardo Ribalda
@ 2022-09-08 20:59                       ` Laurent Pinchart
  0 siblings, 0 replies; 20+ messages in thread
From: Laurent Pinchart @ 2022-09-08 20:59 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Maxime Ripard, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

On Thu, Sep 08, 2022 at 10:04:54PM +0200, Ricardo Ribalda wrote:
> On Thu, 8 Sept 2022 at 21:31, Laurent Pinchart wrote:
> > On Thu, Sep 08, 2022 at 08:13:57PM +0200, Ricardo Ribalda wrote:
> > > On Thu, 8 Sept 2022 at 20:13, Ricardo Ribalda wrote:
> > > > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard wrote:
> > > > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > > > Hi Ricardo,
> > > > > > > > >
> > > > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > > > >
> > > > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > > > for downstream or closed drivers :)
> > > > > > > > >
> > > > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > > > >
> > > > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > > > >
> > > > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > > > DRM driver, and what is in Mesa.
> > > > > > >
> > > > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > > > register access request), only the (kernel) driver does that.
> > > > > >
> > > > > > Mesa compiles shaders, but also more generally produces command streams
> > > > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > > > the device with as little processing and validation as possible (when
> > > > > > the device is designed with multi-clients in mind, that processing and
> > > > > > validation can be reduced a lot).
> > > > >
> > > > > That's true, but at no point in time is the CPU ever touches that
> > > > > command stream blob in the case of DRM...
> > > >
> > > > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > > > a set of commands to a firmware that does the actual R/W to the
> > > > hardware.
> >
> > *Some* of the latest hardware. There are new SoCs getting to the market
> > today with GPUs that are fully programmed from the kernel, and even more
> > that are fully programmable from the kernel even if the stack provided
> > by the SoC vendor has a firmware that takes care of programming the ISP.
> >
> > One thing that isn't clear in your proposal is where the line is drawn.
> > Passing a blob to the ISP firmware involves some kind of communication
> > mechanism, which ultimately deals with hardware registers somewhere.
> > It's not clear if those registers are part of the blob that userspace
> > passes to the kernel. I'd assume not, but clarifying where the line is
> > would be useful.
> >
> > > > For hardware that is a register set, the vendor should have a good
> > > > idea about what kind of validation should be needed: raw access (deny
> > > > list) or more abstracted (allow list).
> > > >
> > > > The most critical part is the DMA, and that will always be abstracted.
> > > > Also I doubt that we will have new hardware without an IOMMU, so we
> > > > have the same layers of security as today.
> >
> > I know of SoCs in the making that have ISPs and no IOMMU.
> 
> I guess this is why you meant with reserving CVE ranges :)

I recall very vividly a painful debugging session a long time ago, on an
old SoC that had an IOMMU. Under certain circumstances, the DMA engine,
when crossing a page boundary, would write the next two bytes to the
next physical page instead of the next virtual page. That was clearly a
hardware bug, and hardware bugs happen all the time, on both old and new
hardware. The kernel driver made sure that the particular circumstances
in which this could happen would never occur. It was fairly easy as it
only involved validating the combination of image width and pixel
format. Validating hardware registers would have been more complex.

TL;DR, even with an IOMMU, validation is will be needed. Let's also not
forget that we also need to ensure multiple clients will not have access
to each other's memory on recent SoCs where ISPs are meant to be used by
multiple clients (but without support for per-client address spaces at
the ISP hardware level, such as what could be provided by ASID and
similar mechanisms).

> > > > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > > > to communicate with the ISP firmware, and then most of the hardware
> > > > > > registers for the actual image processing blocks being programmed
> > > > > > based from the command stream. "Command stream" may not be a very good
> > > > > > term for ISPs as it's not really a stream of commands, but
> > > > > > conceptually, we're dealing with a blob that is computed by userspace.
> > > > >
> > > > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > > > Maybe not in all cases, but it's still a significant difference.
> > > > >
> > > > > If we had to draw a parallel with something else in the kernel, it looks
> > > > > way more like eBPF or the discussion we had on where to parse the
> > > > > bitstreams for stateless codecs.
> > > > >
> > > > > The first one has been severely constrained to avoid the issues we've
> > > > > raised, and we all know how the second one went.
> > > >
> > > > In eBPF, you are moving some user code to the kernel, with an unstable API.
> > > >
> > > > In KCAM, (and in DRM), you let the user build a set of operations,
> > > > that you pass to the kernel via a stable API, then it is validated and
> > > > scheduled by the kernel.
> > > >
> > > > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
> > >
> > > s/GPIO/GPU/ ;)
> 
> Looking forward to Monday :)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-08 18:13               ` Ricardo Ribalda
  2022-09-08 18:13                 ` Ricardo Ribalda
@ 2022-09-09  8:00                 ` Maxime Ripard
  2022-09-09  8:39                   ` Ricardo Ribalda
  1 sibling, 1 reply; 20+ messages in thread
From: Maxime Ripard @ 2022-09-09  8:00 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

On Thu, Sep 08, 2022 at 08:13:17PM +0200, Ricardo Ribalda wrote:
> On Thu, 8 Sept 2022 at 17:34, Maxime Ripard <maxime@cerno.tech> wrote:
> >
> > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > Hi Ricardo,
> > > > > >
> > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > >   expect this to be discussed on Monday.
> > > > > > >
> > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > for downstream or closed drivers :)
> > > > > >
> > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > >
> > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > only the kernel part is covered, to avoid misunderstandings.
> > > >
> > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > DRM driver, and what is in Mesa.
> > > >
> > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > register access request), only the (kernel) driver does that.
> > >
> > > Mesa compiles shaders, but also more generally produces command streams
> > > that are passed as blobs to the DRM driver, which then forwards them to
> > > the device with as little processing and validation as possible (when
> > > the device is designed with multi-clients in mind, that processing and
> > > validation can be reduced a lot).
> >
> > That's true, but at no point in time is the CPU ever touches that
> > command stream blob in the case of DRM...
> 
> As Laurent says, the latest hardware is very similar to GPUs, you pass
> a set of commands to a firmware that does the actual R/W to the
> hardware.

For the latest, most powerful, hardware, maybe. I can show you plenty of
other ISP we'll need to support that aren't programmed that way, and in
that case we would end up interpreting whatever is being passed to KCam
on the CPU.

Which is totally different to what DRM/Mesa is doing on *any* hardware.

Another constraint that Mesa has is that there is standards user-space
API that all the applications target when it comes to graphics (OpenGL,
Vulkan, Direct3D, etc.) and you need to support pretty much all of them.
So in that sense, Mesa is a transpiler between that API and the GPU ISA.
We're not in this case either with Kcam.

> For hardware that is a register set, the vendor should have a good
> idea about what kind of validation should be needed: raw access (deny
> list) or more abstracted (allow list).

This would be similar to what is going on with regmap caches. And they
are a pain to deal with because that information is far from being
available for all the devices, and then most drivers don't implement it
either.

Also, if we have to have a whitelist in the kernel, then we need to
introduce and upstream some kind of driver for hardware enablement.
Doesn't that completely defeat the purpose of Kcam?

> The most critical part is the DMA, and that will always be abstracted.

Where do you draw the line then? What will have a driver in the kernel,
and what won't?

And again, the issue I was telling you about was about a configuration
mismatch (following a bogus documentation) between the DMA and the
sensor. If the sensor is part of the userspace and the DMA in the
kernel, we very much can still have that issue.

> Also I doubt that we will have new hardware without an IOMMU, so we
> have the same layers of security as today.

Maybe not for the kind of devices that end up on chromebooks, but
there's definitely hardware being designed today that have an ISP but no
IOMMU.

> > > Recent ISPs have a similar architecture, with a set of registers used
> > > to communicate with the ISP firmware, and then most of the hardware
> > > registers for the actual image processing blocks being programmed
> > > based from the command stream. "Command stream" may not be a very good
> > > term for ISPs as it's not really a stream of commands, but
> > > conceptually, we're dealing with a blob that is computed by userspace.
> >
> > ... while in Kcam, the CPU knows and will interpret that command stream.
> > Maybe not in all cases, but it's still a significant difference.
> >
> > If we had to draw a parallel with something else in the kernel, it looks
> > way more like eBPF or the discussion we had on where to parse the
> > bitstreams for stateless codecs.
> >
> > The first one has been severely constrained to avoid the issues we've
> > raised, and we all know how the second one went.
> 
> In eBPF, you are moving some user code to the kernel, with an unstable API.
>
> In KCAM, (and in DRM), you let the user build a set of operations,
> that you pass to the kernel via a stable API, then it is validated and
> scheduled by the kernel.

You won't be able to have a stable API with that design either. If only
because of that whitelist you were mentioning. Let's say we have a
register that turns out, after the facts, to not be available. If the
userspace ever used to set it at some point, you're screwed. Indeed,
either you move it out of the whitelist, and then you break userspace,
or you don't add it to the whitelist and end up allowing an insecure or
dangerous situation.

And you can't say you would just ignore a register that isn't part of
the whitelist, because then you would enforce a configuration that isn't
the one the user-space asked for, which is even worse.

> X11 was much more bizarre, the GPIO iomem was remapped into userspace.

Yes, but that wasn't the only thing bad with it. I mean, it doesn't
really matter who exactly does the register access eventually. In UMS,
X11 was doing it itself through a mapping of its own, in KCam the kernel
will do it on behalf of the userspace. But we still end up in both cases
with:

  * The entire logic is in userspace
  * Realistically speaking, that logic can only run as root
  * With a poor configuration, the userspace can completely crash the
    system
  * If the userspace crashes, you can end up with a configuration you
    can't really recover from

*All* of those issues are still there with Kcam, even though the actual
 memory mapping isn't in userspace.

Maxime

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-09  8:00                 ` Maxime Ripard
@ 2022-09-09  8:39                   ` Ricardo Ribalda
  2022-09-09  9:06                     ` Maxime Ripard
  2022-09-09  9:11                     ` Laurent Pinchart
  0 siblings, 2 replies; 20+ messages in thread
From: Ricardo Ribalda @ 2022-09-09  8:39 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

Hi Maxime

On Fri, 9 Sept 2022 at 10:00, Maxime Ripard <maxime@cerno.tech> wrote:
>
> On Thu, Sep 08, 2022 at 08:13:17PM +0200, Ricardo Ribalda wrote:
> > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard <maxime@cerno.tech> wrote:
> > >
> > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > Hi Ricardo,
> > > > > > >
> > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > >
> > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > for downstream or closed drivers :)
> > > > > > >
> > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > >
> > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > >
> > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > DRM driver, and what is in Mesa.
> > > > >
> > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > register access request), only the (kernel) driver does that.
> > > >
> > > > Mesa compiles shaders, but also more generally produces command streams
> > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > the device with as little processing and validation as possible (when
> > > > the device is designed with multi-clients in mind, that processing and
> > > > validation can be reduced a lot).
> > >
> > > That's true, but at no point in time is the CPU ever touches that
> > > command stream blob in the case of DRM...
> >
> > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > a set of commands to a firmware that does the actual R/W to the
> > hardware.
>
> For the latest, most powerful, hardware, maybe. I can show you plenty of
> other ISP we'll need to support that aren't programmed that way, and in
> that case we would end up interpreting whatever is being passed to KCam
> on the CPU.

Kcam is not meant to replace V4L2, if a hardware is better modeled in
V4L2, they can use it.

>
> Which is totally different to what DRM/Mesa is doing on *any* hardware.
>
> Another constraint that Mesa has is that there is standards user-space
> API that all the applications target when it comes to graphics (OpenGL,
> Vulkan, Direct3D, etc.) and you need to support pretty much all of them.
> So in that sense, Mesa is a transpiler between that API and the GPU ISA.
> We're not in this case either with Kcam.

We also have APIs for cameras: V4L2, Android HAL, libcamera,
one-of-the-many-industrial-APIs

The userspace stack will transpile between that API and the ISP command buffers.


>
> > For hardware that is a register set, the vendor should have a good
> > idea about what kind of validation should be needed: raw access (deny
> > list) or more abstracted (allow list).
>
> This would be similar to what is going on with regmap caches. And they
> are a pain to deal with because that information is far from being
> available for all the devices, and then most drivers don't implement it
> either.
>
> Also, if we have to have a whitelist in the kernel, then we need to
> introduce and upstream some kind of driver for hardware enablement.
> Doesn't that completely defeat the purpose of Kcam?

The allowlist model that I mention is not about filtering what
registers can be written and what not. It is about abstracting them
completely if you do not trust the hardware:

Lets say that you only have 4 verified modes (like we have on many
sensors), then you expose a single register with 4 valid values:
0,1,2,3. The driver will convert that single register write into N
writes to registers.



>
> > The most critical part is the DMA, and that will always be abstracted.
>
> Where do you draw the line then? What will have a driver in the kernel,
> and what won't?

If there is memory access: abstraction
If the hardware is not trusted/documented:abstraction
If a specific configuration is know to be invalid and leaves the
system in an invalid state:filtering
everything else: raw access (+validation)


>
> And again, the issue I was telling you about was about a configuration
> mismatch (following a bogus documentation) between the DMA and the
> sensor. If the sensor is part of the userspace and the DMA in the
> kernel, we very much can still have that issue.

With internal operations you can achieve cooperation between the entities.

>
> > Also I doubt that we will have new hardware without an IOMMU, so we
> > have the same layers of security as today.
>
> Maybe not for the kind of devices that end up on chromebooks, but
> there's definitely hardware being designed today that have an ISP but no
> IOMMU.

For the non-iommu hardware, you will have the same security as today:
driver validation.

>
> > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > to communicate with the ISP firmware, and then most of the hardware
> > > > registers for the actual image processing blocks being programmed
> > > > based from the command stream. "Command stream" may not be a very good
> > > > term for ISPs as it's not really a stream of commands, but
> > > > conceptually, we're dealing with a blob that is computed by userspace.
> > >
> > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > Maybe not in all cases, but it's still a significant difference.
> > >
> > > If we had to draw a parallel with something else in the kernel, it looks
> > > way more like eBPF or the discussion we had on where to parse the
> > > bitstreams for stateless codecs.
> > >
> > > The first one has been severely constrained to avoid the issues we've
> > > raised, and we all know how the second one went.
> >
> > In eBPF, you are moving some user code to the kernel, with an unstable API.
> >
> > In KCAM, (and in DRM), you let the user build a set of operations,
> > that you pass to the kernel via a stable API, then it is validated and
> > scheduled by the kernel.
>
> You won't be able to have a stable API with that design either. If only
> because of that whitelist you were mentioning. Let's say we have a
> register that turns out, after the facts, to not be available. If the
> userspace ever used to set it at some point, you're screwed. Indeed,
> either you move it out of the whitelist, and then you break userspace,
> or you don't add it to the whitelist and end up allowing an insecure or
> dangerous situation.

See above for our description of allowlist.

Also, using the drm model as reference. kernel version, libdrm and
mesa (and even llvm) are very coupled. Using a wrong version can lead
to unexpected results or even GPU hangs.

What to do when we fix bugs that affect functionality is something
that we need to decide on case to case cases. The same way we do today
when hardware does not support a control value and we discover it 10
versions later.

>
> And you can't say you would just ignore a register that isn't part of
> the whitelist, because then you would enforce a configuration that isn't
> the one the user-space asked for, which is even worse.
>
> > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
>
> Yes, but that wasn't the only thing bad with it. I mean, it doesn't
> really matter who exactly does the register access eventually. In UMS,
> X11 was doing it itself through a mapping of its own, in KCam the kernel
> will do it on behalf of the userspace. But we still end up in both cases
> with:
>
>   * The entire logic is in userspace

We can argue if this is an issue or not. I think it is not

>   * Realistically speaking, that logic can only run as root

Do not agree.

>   * With a poor configuration, the userspace can completely crash the
>     system
>   * If the userspace crashes, you can end up with a configuration you
>     can't really recover from

A Kcam driver can give you broken images, but never crash the system
or leave it in an unrecoverable state. That is the main guarantee that
we expect from the drivers.

>
> *All* of those issues are still there with Kcam, even though the actual
>  memory mapping isn't in userspace


.
>
> Maxime



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-09  8:39                   ` Ricardo Ribalda
@ 2022-09-09  9:06                     ` Maxime Ripard
  2022-09-09 10:00                       ` Ricardo Ribalda
  2022-09-09  9:11                     ` Laurent Pinchart
  1 sibling, 1 reply; 20+ messages in thread
From: Maxime Ripard @ 2022-09-09  9:06 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

On Fri, Sep 09, 2022 at 10:39:36AM +0200, Ricardo Ribalda wrote:
> Hi Maxime
> 
> On Fri, 9 Sept 2022 at 10:00, Maxime Ripard <maxime@cerno.tech> wrote:
> >
> > On Thu, Sep 08, 2022 at 08:13:17PM +0200, Ricardo Ribalda wrote:
> > > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard <maxime@cerno.tech> wrote:
> > > >
> > > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > > Hi Ricardo,
> > > > > > > >
> > > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > > >
> > > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > > for downstream or closed drivers :)
> > > > > > > >
> > > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > > >
> > > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > > >
> > > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > > DRM driver, and what is in Mesa.
> > > > > >
> > > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > > register access request), only the (kernel) driver does that.
> > > > >
> > > > > Mesa compiles shaders, but also more generally produces command streams
> > > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > > the device with as little processing and validation as possible (when
> > > > > the device is designed with multi-clients in mind, that processing and
> > > > > validation can be reduced a lot).
> > > >
> > > > That's true, but at no point in time is the CPU ever touches that
> > > > command stream blob in the case of DRM...
> > >
> > > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > > a set of commands to a firmware that does the actual R/W to the
> > > hardware.
> >
> > For the latest, most powerful, hardware, maybe. I can show you plenty of
> > other ISP we'll need to support that aren't programmed that way, and in
> > that case we would end up interpreting whatever is being passed to KCam
> > on the CPU.
> 
> Kcam is not meant to replace V4L2, if a hardware is better modeled in
> V4L2, they can use it.

I'm not sure that alone is going to fly. Having to support the same
device in multiple frameworks based on who is using it exactly is very
frowned upon: it's a waste of development, review and maintenance time.

If we aim for something, it's to supersede v4l2, or to extend v4l2.

> > Which is totally different to what DRM/Mesa is doing on *any* hardware.
> >
> > Another constraint that Mesa has is that there is standards user-space
> > API that all the applications target when it comes to graphics (OpenGL,
> > Vulkan, Direct3D, etc.) and you need to support pretty much all of them.
> > So in that sense, Mesa is a transpiler between that API and the GPU ISA.
> > We're not in this case either with Kcam.
> 
> We also have APIs for cameras: V4L2, Android HAL, libcamera,
> one-of-the-many-industrial-APIs

The fact that you mention that there is many industrial APIs kind of
prove my point: none of them are anywhere near industry standards like
OpenGL or Vulkan can be. And you never mentioned that you wanted to
support all of them as a goal for kcam?

> The userspace stack will transpile between that API and the ISP
> command buffers.

And if there's no ISA for the ISP, then it's just going to create some
kind of bytecode that the kernel will execute.

> > > For hardware that is a register set, the vendor should have a good
> > > idea about what kind of validation should be needed: raw access (deny
> > > list) or more abstracted (allow list).
> >
> > This would be similar to what is going on with regmap caches. And they
> > are a pain to deal with because that information is far from being
> > available for all the devices, and then most drivers don't implement it
> > either.
> >
> > Also, if we have to have a whitelist in the kernel, then we need to
> > introduce and upstream some kind of driver for hardware enablement.
> > Doesn't that completely defeat the purpose of Kcam?
> 
> The allowlist model that I mention is not about filtering what
> registers can be written and what not. It is about abstracting them
> completely if you do not trust the hardware:
> 
> Lets say that you only have 4 verified modes (like we have on many
> sensors), then you expose a single register with 4 valid values:
> 0,1,2,3. The driver will convert that single register write into N
> writes to registers.

I'm not sure I get how that is different to what we have today with v4l2
controls? I'm pretty sure that's exactly how we discover and change
modes today with v4l2.

> > > The most critical part is the DMA, and that will always be abstracted.
> >
> > Where do you draw the line then? What will have a driver in the kernel,
> > and what won't?
> 
> If there is memory access: abstraction
> If the hardware is not trusted/documented:abstraction
> If a specific configuration is know to be invalid and leaves the
> system in an invalid state:filtering
> everything else: raw access (+validation)

I mean, the ISP you mentioned at least has to access the command buffer
somehow, and surely that counts as a memory access?

And so, what happens if the ISP is not entirely documented? You start
with a kernel driver, and then once it is documented you move it to Kcam
breaking all users in the process?

> > And again, the issue I was telling you about was about a configuration
> > mismatch (following a bogus documentation) between the DMA and the
> > sensor. If the sensor is part of the userspace and the DMA in the
> > kernel, we very much can still have that issue.
> 
> With internal operations you can achieve cooperation between the entities.

Again, looks like what we currenty have with v4l2 to me.

> > > Also I doubt that we will have new hardware without an IOMMU, so we
> > > have the same layers of security as today.
> >
> > Maybe not for the kind of devices that end up on chromebooks, but
> > there's definitely hardware being designed today that have an ISP but no
> > IOMMU.
> 
> For the non-iommu hardware, you will have the same security as today:
> driver validation.

I mean, I'm all for it. But the stated goal of Kcam is to reduce the
driver logic so that most of it is in userspace, but most of your
answers to challenges so far has been "but we'll have a driver for that"

If a driver is the solution, why do we need the Kcam architecture in the
first place?

> > > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > > to communicate with the ISP firmware, and then most of the hardware
> > > > > registers for the actual image processing blocks being programmed
> > > > > based from the command stream. "Command stream" may not be a very good
> > > > > term for ISPs as it's not really a stream of commands, but
> > > > > conceptually, we're dealing with a blob that is computed by userspace.
> > > >
> > > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > > Maybe not in all cases, but it's still a significant difference.
> > > >
> > > > If we had to draw a parallel with something else in the kernel, it looks
> > > > way more like eBPF or the discussion we had on where to parse the
> > > > bitstreams for stateless codecs.
> > > >
> > > > The first one has been severely constrained to avoid the issues we've
> > > > raised, and we all know how the second one went.
> > >
> > > In eBPF, you are moving some user code to the kernel, with an unstable API.
> > >
> > > In KCAM, (and in DRM), you let the user build a set of operations,
> > > that you pass to the kernel via a stable API, then it is validated and
> > > scheduled by the kernel.
> >
> > You won't be able to have a stable API with that design either. If only
> > because of that whitelist you were mentioning. Let's say we have a
> > register that turns out, after the facts, to not be available. If the
> > userspace ever used to set it at some point, you're screwed. Indeed,
> > either you move it out of the whitelist, and then you break userspace,
> > or you don't add it to the whitelist and end up allowing an insecure or
> > dangerous situation.
> 
> See above for our description of allowlist.
> 
> Also, using the drm model as reference. kernel version, libdrm and
> mesa (and even llvm) are very coupled. Using a wrong version can lead
> to unexpected results or even GPU hangs.

Right.

And those are considered regressions:
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

"""
The Linux kernel’s “no regression” policy holds in practice only for
open-source userspace of the DRM subsystem. DRM developers are perfectly
fine if closed-source blob drivers in userspace use the same uAPI as the
open drivers, but they must do so in the exact same way as the open
drivers. Creative (ab)use of the interfaces will, and in the past
routinely has, lead to breakage.
"""

> What to do when we fix bugs that affect functionality is something
> that we need to decide on case to case cases. The same way we do today
> when hardware does not support a control value and we discover it 10
> versions later.

Indeed, but we need to have some idea on what that process is going to
look like in practice. If we put ourselves is a corner and don't allow
for some bug resolutions in the first place, then we won't be able to
fix them when we'll encounter them.

> > And you can't say you would just ignore a register that isn't part of
> > the whitelist, because then you would enforce a configuration that isn't
> > the one the user-space asked for, which is even worse.
> >
> > > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
> >
> > Yes, but that wasn't the only thing bad with it. I mean, it doesn't
> > really matter who exactly does the register access eventually. In UMS,
> > X11 was doing it itself through a mapping of its own, in KCam the kernel
> > will do it on behalf of the userspace. But we still end up in both cases
> > with:
> >
> >   * The entire logic is in userspace
> 
> We can argue if this is an issue or not. I think it is not
> 
> >   * Realistically speaking, that logic can only run as root
> 
> Do not agree.

How so? Are you going to allow any javascript enabled website to poke
into Kcam?

> >   * With a poor configuration, the userspace can completely crash the
> >     system
> >   * If the userspace crashes, you can end up with a configuration you
> >     can't really recover from
> 
> A Kcam driver can give you broken images, but never crash the system
> or leave it in an unrecoverable state. That is the main guarantee that
> we expect from the drivers.

That's wishful thinking. If your application crashes halfway through the
configuration, you're left in a weird state you know nothing about now.
That's impossible to recover from.

Maxime

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-09  8:39                   ` Ricardo Ribalda
  2022-09-09  9:06                     ` Maxime Ripard
@ 2022-09-09  9:11                     ` Laurent Pinchart
  1 sibling, 0 replies; 20+ messages in thread
From: Laurent Pinchart @ 2022-09-09  9:11 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Maxime Ripard, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

Hi Ricardo,

On Fri, Sep 09, 2022 at 10:39:36AM +0200, Ricardo Ribalda wrote:
> On Fri, 9 Sept 2022 at 10:00, Maxime Ripard wrote:
> > On Thu, Sep 08, 2022 at 08:13:17PM +0200, Ricardo Ribalda wrote:
> > > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard wrote:
> > > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > > Hi Ricardo,
> > > > > > > >
> > > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > > >
> > > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > > for downstream or closed drivers :)
> > > > > > > >
> > > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > > >
> > > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > > >
> > > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > > DRM driver, and what is in Mesa.
> > > > > >
> > > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > > register access request), only the (kernel) driver does that.
> > > > >
> > > > > Mesa compiles shaders, but also more generally produces command streams
> > > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > > the device with as little processing and validation as possible (when
> > > > > the device is designed with multi-clients in mind, that processing and
> > > > > validation can be reduced a lot).
> > > >
> > > > That's true, but at no point in time is the CPU ever touches that
> > > > command stream blob in the case of DRM...
> > >
> > > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > > a set of commands to a firmware that does the actual R/W to the
> > > hardware.
> >
> > For the latest, most powerful, hardware, maybe. I can show you plenty of
> > other ISP we'll need to support that aren't programmed that way, and in
> > that case we would end up interpreting whatever is being passed to KCam
> > on the CPU.
> 
> Kcam is not meant to replace V4L2, if a hardware is better modeled in
> V4L2, they can use it.
> 
> > Which is totally different to what DRM/Mesa is doing on *any* hardware.
> >
> > Another constraint that Mesa has is that there is standards user-space
> > API that all the applications target when it comes to graphics (OpenGL,
> > Vulkan, Direct3D, etc.) and you need to support pretty much all of them.
> > So in that sense, Mesa is a transpiler between that API and the GPU ISA.
> > We're not in this case either with Kcam.
> 
> We also have APIs for cameras: V4L2, Android HAL, libcamera,
> one-of-the-many-industrial-APIs
> 
> The userspace stack will transpile between that API and the ISP command buffers.
> 
> > > For hardware that is a register set, the vendor should have a good
> > > idea about what kind of validation should be needed: raw access (deny
> > > list) or more abstracted (allow list).
> >
> > This would be similar to what is going on with regmap caches. And they
> > are a pain to deal with because that information is far from being
> > available for all the devices, and then most drivers don't implement it
> > either.
> >
> > Also, if we have to have a whitelist in the kernel, then we need to
> > introduce and upstream some kind of driver for hardware enablement.
> > Doesn't that completely defeat the purpose of Kcam?
> 
> The allowlist model that I mention is not about filtering what
> registers can be written and what not. It is about abstracting them
> completely if you do not trust the hardware:
> 
> Lets say that you only have 4 verified modes (like we have on many
> sensors), then you expose a single register with 4 valid values:
> 0,1,2,3. The driver will convert that single register write into N
> writes to registers.

Continuing my quest to try to get everybody to understand the proposal
the same way: "register" is a very bad term for this. It's widely
understood to mean hardware registers by the target audience of the API,
so we should use a different term for the abstract/synthetic parameters
that the API exposes to userspace.

> > > The most critical part is the DMA, and that will always be abstracted.
> >
> > Where do you draw the line then? What will have a driver in the kernel,
> > and what won't?
> 
> If there is memory access: abstraction
> If the hardware is not trusted/documented:abstraction
> If a specific configuration is know to be invalid and leaves the
> system in an invalid state:filtering
> everything else: raw access (+validation)
> 
> > And again, the issue I was telling you about was about a configuration
> > mismatch (following a bogus documentation) between the DMA and the
> > sensor. If the sensor is part of the userspace and the DMA in the
> > kernel, we very much can still have that issue.
> 
> With internal operations you can achieve cooperation between the entities.
> 
> > > Also I doubt that we will have new hardware without an IOMMU, so we
> > > have the same layers of security as today.
> >
> > Maybe not for the kind of devices that end up on chromebooks, but
> > there's definitely hardware being designed today that have an ISP but no
> > IOMMU.
> 
> For the non-iommu hardware, you will have the same security as today:
> driver validation.
> 
> > > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > > to communicate with the ISP firmware, and then most of the hardware
> > > > > registers for the actual image processing blocks being programmed
> > > > > based from the command stream. "Command stream" may not be a very good
> > > > > term for ISPs as it's not really a stream of commands, but
> > > > > conceptually, we're dealing with a blob that is computed by userspace.
> > > >
> > > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > > Maybe not in all cases, but it's still a significant difference.
> > > >
> > > > If we had to draw a parallel with something else in the kernel, it looks
> > > > way more like eBPF or the discussion we had on where to parse the
> > > > bitstreams for stateless codecs.
> > > >
> > > > The first one has been severely constrained to avoid the issues we've
> > > > raised, and we all know how the second one went.
> > >
> > > In eBPF, you are moving some user code to the kernel, with an unstable API.
> > >
> > > In KCAM, (and in DRM), you let the user build a set of operations,
> > > that you pass to the kernel via a stable API, then it is validated and
> > > scheduled by the kernel.
> >
> > You won't be able to have a stable API with that design either. If only
> > because of that whitelist you were mentioning. Let's say we have a
> > register that turns out, after the facts, to not be available. If the
> > userspace ever used to set it at some point, you're screwed. Indeed,
> > either you move it out of the whitelist, and then you break userspace,
> > or you don't add it to the whitelist and end up allowing an insecure or
> > dangerous situation.
> 
> See above for our description of allowlist.
> 
> Also, using the drm model as reference. kernel version, libdrm and
> mesa (and even llvm) are very coupled. Using a wrong version can lead
> to unexpected results or even GPU hangs.
> 
> What to do when we fix bugs that affect functionality is something
> that we need to decide on case to case cases. The same way we do today
> when hardware does not support a control value and we discover it 10
> versions later.
> 
> > And you can't say you would just ignore a register that isn't part of
> > the whitelist, because then you would enforce a configuration that isn't
> > the one the user-space asked for, which is even worse.
> >
> > > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
> >
> > Yes, but that wasn't the only thing bad with it. I mean, it doesn't
> > really matter who exactly does the register access eventually. In UMS,
> > X11 was doing it itself through a mapping of its own, in KCam the kernel
> > will do it on behalf of the userspace. But we still end up in both cases
> > with:
> >
> >   * The entire logic is in userspace
> 
> We can argue if this is an issue or not. I think it is not
> 
> >   * Realistically speaking, that logic can only run as root
> 
> Do not agree.
> 
> >   * With a poor configuration, the userspace can completely crash the
> >     system
> >   * If the userspace crashes, you can end up with a configuration you
> >     can't really recover from
> 
> A Kcam driver can give you broken images, but never crash the system
> or leave it in an unrecoverable state. That is the main guarantee that
> we expect from the drivers.
> 
> > *All* of those issues are still there with Kcam, even though the actual
> >  memory mapping isn't in userspace

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-09  9:06                     ` Maxime Ripard
@ 2022-09-09 10:00                       ` Ricardo Ribalda
  2022-09-09 11:58                         ` Maxime Ripard
  0 siblings, 1 reply; 20+ messages in thread
From: Ricardo Ribalda @ 2022-09-09 10:00 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

Hi

On Fri, 9 Sept 2022 at 11:06, Maxime Ripard <maxime@cerno.tech> wrote:
>
> On Fri, Sep 09, 2022 at 10:39:36AM +0200, Ricardo Ribalda wrote:
> > Hi Maxime
> >
> > On Fri, 9 Sept 2022 at 10:00, Maxime Ripard <maxime@cerno.tech> wrote:
> > >
> > > On Thu, Sep 08, 2022 at 08:13:17PM +0200, Ricardo Ribalda wrote:
> > > > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard <maxime@cerno.tech> wrote:
> > > > >
> > > > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > > > Hi Ricardo,
> > > > > > > > >
> > > > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > > > >
> > > > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > > > for downstream or closed drivers :)
> > > > > > > > >
> > > > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > > > >
> > > > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > > > >
> > > > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > > > DRM driver, and what is in Mesa.
> > > > > > >
> > > > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > > > register access request), only the (kernel) driver does that.
> > > > > >
> > > > > > Mesa compiles shaders, but also more generally produces command streams
> > > > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > > > the device with as little processing and validation as possible (when
> > > > > > the device is designed with multi-clients in mind, that processing and
> > > > > > validation can be reduced a lot).
> > > > >
> > > > > That's true, but at no point in time is the CPU ever touches that
> > > > > command stream blob in the case of DRM...
> > > >
> > > > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > > > a set of commands to a firmware that does the actual R/W to the
> > > > hardware.
> > >
> > > For the latest, most powerful, hardware, maybe. I can show you plenty of
> > > other ISP we'll need to support that aren't programmed that way, and in
> > > that case we would end up interpreting whatever is being passed to KCam
> > > on the CPU.
> >
> > Kcam is not meant to replace V4L2, if a hardware is better modeled in
> > V4L2, they can use it.
>
> I'm not sure that alone is going to fly. Having to support the same
> device in multiple frameworks based on who is using it exactly is very
> frowned upon: it's a waste of development, review and maintenance time.
>
> If we aim for something, it's to supersede v4l2, or to extend v4l2.

You do not support the same device in multiple frameworks. You pick
the driver that fits your purpose better.

Like Industrial IO vs other subsystems.

>
> > > Which is totally different to what DRM/Mesa is doing on *any* hardware.
> > >
> > > Another constraint that Mesa has is that there is standards user-space
> > > API that all the applications target when it comes to graphics (OpenGL,
> > > Vulkan, Direct3D, etc.) and you need to support pretty much all of them.
> > > So in that sense, Mesa is a transpiler between that API and the GPU ISA.
> > > We're not in this case either with Kcam.
> >
> > We also have APIs for cameras: V4L2, Android HAL, libcamera,
> > one-of-the-many-industrial-APIs
>
> The fact that you mention that there is many industrial APIs kind of
> prove my point: none of them are anywhere near industry standards like
> OpenGL or Vulkan can be. And you never mentioned that you wanted to
> support all of them as a goal for kcam?

The industry is already working for a camera API standard under the
Khronos umbrella:

https://www.khronos.org/camera/#:~:text=The%20Khronos%20Camera%20API%20Working,open%20standard%20camera%20system%20API.

We want to support any camera API on top of kcam, to avoid effort
duplication from vendors.

>
> > The userspace stack will transpile between that API and the ISP
> > command buffers.
>
> And if there's no ISA for the ISP, then it's just going to create some
> kind of bytecode that the kernel will execute.

>
> > > > For hardware that is a register set, the vendor should have a good
> > > > idea about what kind of validation should be needed: raw access (deny
> > > > list) or more abstracted (allow list).
> > >
> > > This would be similar to what is going on with regmap caches. And they
> > > are a pain to deal with because that information is far from being
> > > available for all the devices, and then most drivers don't implement it
> > > either.
> > >
> > > Also, if we have to have a whitelist in the kernel, then we need to
> > > introduce and upstream some kind of driver for hardware enablement.
> > > Doesn't that completely defeat the purpose of Kcam?
> >
> > The allowlist model that I mention is not about filtering what
> > registers can be written and what not. It is about abstracting them
> > completely if you do not trust the hardware:
> >
> > Lets say that you only have 4 verified modes (like we have on many
> > sensors), then you expose a single register with 4 valid values:
> > 0,1,2,3. The driver will convert that single register write into N
> > writes to registers.
>
> I'm not sure I get how that is different to what we have today with v4l2
> controls? I'm pretty sure that's exactly how we discover and change
> modes today with v4l2.
>
> > > > The most critical part is the DMA, and that will always be abstracted.
> > >
> > > Where do you draw the line then? What will have a driver in the kernel,
> > > and what won't?
> >
> > If there is memory access: abstraction
> > If the hardware is not trusted/documented:abstraction
> > If a specific configuration is know to be invalid and leaves the
> > system in an invalid state:filtering
> > everything else: raw access (+validation)
>
> I mean, the ISP you mentioned at least has to access the command buffer
> somehow, and surely that counts as a memory access?
>
> And so, what happens if the ISP is not entirely documented? You start
> with a kernel driver, and then once it is documented you move it to Kcam
> breaking all users in the process?

I think there is a misunderstanding here, All the hardware needs a
kernel driver.
The functionality of that driver depends on the trust/documentation on
the platform:

- simple validation
- filtering
- "modes"



>
> > > And again, the issue I was telling you about was about a configuration
> > > mismatch (following a bogus documentation) between the DMA and the
> > > sensor. If the sensor is part of the userspace and the DMA in the
> > > kernel, we very much can still have that issue.
> >
> > With internal operations you can achieve cooperation between the entities.
>
> Again, looks like what we currenty have with v4l2 to me.

In v4l2, the API is inside the kernel.  Kcam  is API agnostic, there
is no concept of streams, formats, controls...

>
> > > > Also I doubt that we will have new hardware without an IOMMU, so we
> > > > have the same layers of security as today.
> > >
> > > Maybe not for the kind of devices that end up on chromebooks, but
> > > there's definitely hardware being designed today that have an ISP but no
> > > IOMMU.
> >
> > For the non-iommu hardware, you will have the same security as today:
> > driver validation.
>
> I mean, I'm all for it. But the stated goal of Kcam is to reduce the
> driver logic so that most of it is in userspace, but most of your
> answers to challenges so far has been "but we'll have a driver for that"
>
> If a driver is the solution, why do we need the Kcam architecture in the
> first place?

To create a platform where you can build any API.

>
> > > > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > > > to communicate with the ISP firmware, and then most of the hardware
> > > > > > registers for the actual image processing blocks being programmed
> > > > > > based from the command stream. "Command stream" may not be a very good
> > > > > > term for ISPs as it's not really a stream of commands, but
> > > > > > conceptually, we're dealing with a blob that is computed by userspace.
> > > > >
> > > > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > > > Maybe not in all cases, but it's still a significant difference.
> > > > >
> > > > > If we had to draw a parallel with something else in the kernel, it looks
> > > > > way more like eBPF or the discussion we had on where to parse the
> > > > > bitstreams for stateless codecs.
> > > > >
> > > > > The first one has been severely constrained to avoid the issues we've
> > > > > raised, and we all know how the second one went.
> > > >
> > > > In eBPF, you are moving some user code to the kernel, with an unstable API.
> > > >
> > > > In KCAM, (and in DRM), you let the user build a set of operations,
> > > > that you pass to the kernel via a stable API, then it is validated and
> > > > scheduled by the kernel.
> > >
> > > You won't be able to have a stable API with that design either. If only
> > > because of that whitelist you were mentioning. Let's say we have a
> > > register that turns out, after the facts, to not be available. If the
> > > userspace ever used to set it at some point, you're screwed. Indeed,
> > > either you move it out of the whitelist, and then you break userspace,
> > > or you don't add it to the whitelist and end up allowing an insecure or
> > > dangerous situation.
> >
> > See above for our description of allowlist.
> >
> > Also, using the drm model as reference. kernel version, libdrm and
> > mesa (and even llvm) are very coupled. Using a wrong version can lead
> > to unexpected results or even GPU hangs.
>
> Right.
>
> And those are considered regressions:
> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
>
> """
> The Linux kernel’s “no regression” policy holds in practice only for
> open-source userspace of the DRM subsystem. DRM developers are perfectly
> fine if closed-source blob drivers in userspace use the same uAPI as the
> open drivers, but they must do so in the exact same way as the open
> drivers. Creative (ab)use of the interfaces will, and in the past
> routinely has, lead to breakage.
> """

There are a lot of hypothetical scenarios: erratas, big documentation
drop from a vendor, drivers obtained via reverse engineering....

There is no one-size-fits-all solution, and every case needs to be
considered individually when they come.

As any other kernel subsystems we need to live under the
never-break-the-userspace. Obey gravity, it is the law ;)


>
> > What to do when we fix bugs that affect functionality is something
> > that we need to decide on case to case cases. The same way we do today
> > when hardware does not support a control value and we discover it 10
> > versions later.
>
> Indeed, but we need to have some idea on what that process is going to
> look like in practice. If we put ourselves is a corner and don't allow
> for some bug resolutions in the first place, then we won't be able to
> fix them when we'll encounter them.

We can definately look into establishing some guidelines once we have
a more clear vision of kcam and its extent.


>
> > > And you can't say you would just ignore a register that isn't part of
> > > the whitelist, because then you would enforce a configuration that isn't
> > > the one the user-space asked for, which is even worse.
> > >
> > > > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
> > >
> > > Yes, but that wasn't the only thing bad with it. I mean, it doesn't
> > > really matter who exactly does the register access eventually. In UMS,
> > > X11 was doing it itself through a mapping of its own, in KCam the kernel
> > > will do it on behalf of the userspace. But we still end up in both cases
> > > with:
> > >
> > >   * The entire logic is in userspace
> >
> > We can argue if this is an issue or not. I think it is not
> >
> > >   * Realistically speaking, that logic can only run as root
> >
> > Do not agree.
>
> How so? Are you going to allow any javascript enabled website to poke
> into Kcam?

Can you access the GPU ISA directly as an external app on the browser?
Same answer here.

The browser will typically speak with a system service to access the
camera: pipewire, gstreamer....

>
> > >   * With a poor configuration, the userspace can completely crash the
> > >     system
> > >   * If the userspace crashes, you can end up with a configuration you
> > >     can't really recover from
> >
> > A Kcam driver can give you broken images, but never crash the system
> > or leave it in an unrecoverable state. That is the main guarantee that
> > we expect from the drivers.
>
> That's wishful thinking. If your application crashes halfway through the
> configuration, you're left in a weird state you know nothing about now.
> That's impossible to recover from.

Today (almost) any kernel driver makes no assumption of the initial
state of the hardware. You might have been interrupted at any state
and rebooting the hardware does not reinit the internal state of the
peripherals.

This is the same, a kcam app needs to init the hardware before it can
make use of it.


>
> Maxime



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-09 10:00                       ` Ricardo Ribalda
@ 2022-09-09 11:58                         ` Maxime Ripard
  2022-09-09 12:40                           ` Ricardo Ribalda
  0 siblings, 1 reply; 20+ messages in thread
From: Maxime Ripard @ 2022-09-09 11:58 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

[-- Attachment #1: Type: text/plain, Size: 14526 bytes --]

On Fri, Sep 09, 2022 at 12:00:34PM +0200, Ricardo Ribalda wrote:
> On Fri, 9 Sept 2022 at 11:06, Maxime Ripard <maxime@cerno.tech> wrote:
> > On Fri, Sep 09, 2022 at 10:39:36AM +0200, Ricardo Ribalda wrote:
> > > On Fri, 9 Sept 2022 at 10:00, Maxime Ripard <maxime@cerno.tech> wrote:
> > > >
> > > > On Thu, Sep 08, 2022 at 08:13:17PM +0200, Ricardo Ribalda wrote:
> > > > > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard <maxime@cerno.tech> wrote:
> > > > > >
> > > > > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > > > > Hi Ricardo,
> > > > > > > > > >
> > > > > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > > > > >
> > > > > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > > > > for downstream or closed drivers :)
> > > > > > > > > >
> > > > > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > > > > >
> > > > > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > > > > >
> > > > > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > > > > DRM driver, and what is in Mesa.
> > > > > > > >
> > > > > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > > > > register access request), only the (kernel) driver does that.
> > > > > > >
> > > > > > > Mesa compiles shaders, but also more generally produces command streams
> > > > > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > > > > the device with as little processing and validation as possible (when
> > > > > > > the device is designed with multi-clients in mind, that processing and
> > > > > > > validation can be reduced a lot).
> > > > > >
> > > > > > That's true, but at no point in time is the CPU ever touches that
> > > > > > command stream blob in the case of DRM...
> > > > >
> > > > > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > > > > a set of commands to a firmware that does the actual R/W to the
> > > > > hardware.
> > > >
> > > > For the latest, most powerful, hardware, maybe. I can show you plenty of
> > > > other ISP we'll need to support that aren't programmed that way, and in
> > > > that case we would end up interpreting whatever is being passed to KCam
> > > > on the CPU.
> > >
> > > Kcam is not meant to replace V4L2, if a hardware is better modeled in
> > > V4L2, they can use it.
> >
> > I'm not sure that alone is going to fly. Having to support the same
> > device in multiple frameworks based on who is using it exactly is very
> > frowned upon: it's a waste of development, review and maintenance time.
> >
> > If we aim for something, it's to supersede v4l2, or to extend v4l2.
> 
> You do not support the same device in multiple frameworks. You pick
> the driver that fits your purpose better.
> 
> Like Industrial IO vs other subsystems.

Again, it's not clear to me where do we draw the line then. Let's say we
have three different systems, one with a DMA and an ov5640, one with a
DMA, an ISP and an ov5640, and let's be a bit exotic and re-use the
first setup, with an external ISP in the mix.

Which setup is going to use which framework, and where the drivers are
going to be?


> > > > > The most critical part is the DMA, and that will always be abstracted.
> > > >
> > > > Where do you draw the line then? What will have a driver in the kernel,
> > > > and what won't?
> > >
> > > If there is memory access: abstraction
> > > If the hardware is not trusted/documented:abstraction
> > > If a specific configuration is know to be invalid and leaves the
> > > system in an invalid state:filtering
> > > everything else: raw access (+validation)
> >
> > I mean, the ISP you mentioned at least has to access the command buffer
> > somehow, and surely that counts as a memory access?
> >
> > And so, what happens if the ISP is not entirely documented? You start
> > with a kernel driver, and then once it is documented you move it to Kcam
> > breaking all users in the process?
> 
> I think there is a misunderstanding here, All the hardware needs a
> kernel driver.
> The functionality of that driver depends on the trust/documentation on
> the platform:
> 
> - simple validation
> - filtering
> - "modes"

I think you really need to show us with a proof of concept what Kcam
exactly is then. Because in your slide 22, you state:


"""
Kcam follows a DRM-like model where the
kernel provides basic functionality:
- Scheduling
- Discovery

Everything else is provided by an userspace library (hopefully libcamera)
"""

What is "everything else" in that case ?

> 
> >
> > > > And again, the issue I was telling you about was about a configuration
> > > > mismatch (following a bogus documentation) between the DMA and the
> > > > sensor. If the sensor is part of the userspace and the DMA in the
> > > > kernel, we very much can still have that issue.
> > >
> > > With internal operations you can achieve cooperation between the entities.
> >
> > Again, looks like what we currenty have with v4l2 to me.
> 
> In v4l2, the API is inside the kernel.  Kcam  is API agnostic, there
> is no concept of streams, formats, controls...

With the "modes abstraction" you mentioned before, we're getting really
close to a concept of controls and formats.

> > > > > Also I doubt that we will have new hardware without an IOMMU, so we
> > > > > have the same layers of security as today.
> > > >
> > > > Maybe not for the kind of devices that end up on chromebooks, but
> > > > there's definitely hardware being designed today that have an ISP but no
> > > > IOMMU.
> > >
> > > For the non-iommu hardware, you will have the same security as today:
> > > driver validation.
> >
> > I mean, I'm all for it. But the stated goal of Kcam is to reduce the
> > driver logic so that most of it is in userspace, but most of your
> > answers to challenges so far has been "but we'll have a driver for that"
> >
> > If a driver is the solution, why do we need the Kcam architecture in the
> > first place?
> 
> To create a platform where you can build any API.

This answers what you want to achieve at the user-space level, but not
really why v4l2 doesn't fit the bill. Libcamera already provides support
for multiple API on top of v4l2

> > > > > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > > > > to communicate with the ISP firmware, and then most of the hardware
> > > > > > > registers for the actual image processing blocks being programmed
> > > > > > > based from the command stream. "Command stream" may not be a very good
> > > > > > > term for ISPs as it's not really a stream of commands, but
> > > > > > > conceptually, we're dealing with a blob that is computed by userspace.
> > > > > >
> > > > > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > > > > Maybe not in all cases, but it's still a significant difference.
> > > > > >
> > > > > > If we had to draw a parallel with something else in the kernel, it looks
> > > > > > way more like eBPF or the discussion we had on where to parse the
> > > > > > bitstreams for stateless codecs.
> > > > > >
> > > > > > The first one has been severely constrained to avoid the issues we've
> > > > > > raised, and we all know how the second one went.
> > > > >
> > > > > In eBPF, you are moving some user code to the kernel, with an unstable API.
> > > > >
> > > > > In KCAM, (and in DRM), you let the user build a set of operations,
> > > > > that you pass to the kernel via a stable API, then it is validated and
> > > > > scheduled by the kernel.
> > > >
> > > > You won't be able to have a stable API with that design either. If only
> > > > because of that whitelist you were mentioning. Let's say we have a
> > > > register that turns out, after the facts, to not be available. If the
> > > > userspace ever used to set it at some point, you're screwed. Indeed,
> > > > either you move it out of the whitelist, and then you break userspace,
> > > > or you don't add it to the whitelist and end up allowing an insecure or
> > > > dangerous situation.
> > >
> > > See above for our description of allowlist.
> > >
> > > Also, using the drm model as reference. kernel version, libdrm and
> > > mesa (and even llvm) are very coupled. Using a wrong version can lead
> > > to unexpected results or even GPU hangs.
> >
> > Right.
> >
> > And those are considered regressions:
> > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
> >
> > """
> > The Linux kernel’s “no regression” policy holds in practice only for
> > open-source userspace of the DRM subsystem. DRM developers are perfectly
> > fine if closed-source blob drivers in userspace use the same uAPI as the
> > open drivers, but they must do so in the exact same way as the open
> > drivers. Creative (ab)use of the interfaces will, and in the past
> > routinely has, lead to breakage.
> > """
> 
> There are a lot of hypothetical scenarios: erratas, big documentation
> drop from a vendor, drivers obtained via reverse engineering....

Those wouldn't be regressions.

> There is no one-size-fits-all solution, and every case needs to be
> considered individually when they come.
> 
> As any other kernel subsystems we need to live under the
> never-break-the-userspace. Obey gravity, it is the law ;)
> 
> 
> >
> > > What to do when we fix bugs that affect functionality is something
> > > that we need to decide on case to case cases. The same way we do today
> > > when hardware does not support a control value and we discover it 10
> > > versions later.
> >
> > Indeed, but we need to have some idea on what that process is going to
> > look like in practice. If we put ourselves is a corner and don't allow
> > for some bug resolutions in the first place, then we won't be able to
> > fix them when we'll encounter them.
> 
> We can definately look into establishing some guidelines once we have
> a more clear vision of kcam and its extent.
> 
> 
> >
> > > > And you can't say you would just ignore a register that isn't part of
> > > > the whitelist, because then you would enforce a configuration that isn't
> > > > the one the user-space asked for, which is even worse.
> > > >
> > > > > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
> > > >
> > > > Yes, but that wasn't the only thing bad with it. I mean, it doesn't
> > > > really matter who exactly does the register access eventually. In UMS,
> > > > X11 was doing it itself through a mapping of its own, in KCam the kernel
> > > > will do it on behalf of the userspace. But we still end up in both cases
> > > > with:
> > > >
> > > >   * The entire logic is in userspace
> > >
> > > We can argue if this is an issue or not. I think it is not
> > >
> > > >   * Realistically speaking, that logic can only run as root
> > >
> > > Do not agree.
> >
> > How so? Are you going to allow any javascript enabled website to poke
> > into Kcam?
> 
> Can you access the GPU ISA directly as an external app on the browser?

I guess?

> Same answer here.

Not really, and I'm sorry but you keep eluding this, and this comes back
in pretty much every mail: the CPU will never interpret the GPU ISA in
kernel space. Kcam might. This is a very big difference.

> The browser will typically speak with a system service to access the
> camera: pipewire, gstreamer....
> 
> >
> > > >   * With a poor configuration, the userspace can completely crash the
> > > >     system
> > > >   * If the userspace crashes, you can end up with a configuration you
> > > >     can't really recover from
> > >
> > > A Kcam driver can give you broken images, but never crash the system
> > > or leave it in an unrecoverable state. That is the main guarantee that
> > > we expect from the drivers.
> >
> > That's wishful thinking. If your application crashes halfway through the
> > configuration, you're left in a weird state you know nothing about now.
> > That's impossible to recover from.
> 
> Today (almost) any kernel driver makes no assumption of the initial
> state of the hardware. You might have been interrupted at any state
> and rebooting the hardware does not reinit the internal state of the
> peripherals.
> 
> This is the same, a kcam app needs to init the hardware before it can
> make use of it.

I guess it's getting obvious by now that Kcam doesn't seem to be a
solution that is getting a consensus, possibly because it's so early
that we don't really have an idea of how it would look like with a real
device.

I'm not sure this can be easily solved in the time slot we have on
Monday, so maybe we can turn this around in order to have some progress:
could we maybe use that time slot to discuss the problem, and outline
the attributes an ideal solution would have to make sure everyone is
roughly on the same page.

And then, if Kcam can be adapted to change those attributes, awesome :)

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Media Summit] ChromeOS Kernel CAM
  2022-09-09 11:58                         ` Maxime Ripard
@ 2022-09-09 12:40                           ` Ricardo Ribalda
  0 siblings, 0 replies; 20+ messages in thread
From: Ricardo Ribalda @ 2022-09-09 12:40 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Laurent Pinchart, Linux Media Mailing List, Sakari Ailus,
	Kieran Bingham, Nicolas Dufresne, Benjamin Gaignard,
	Hidenori Kobayashi, Paul Kocialkowski, Michael Olbrich,
	Daniel Scally, Jernej Škrabec, Niklas Söderlund,
	Michael Tretter, Hans Verkuil, Philipp Zabel,
	Mauro Carvalho Chehab, Benjamin MUGNIER, Jacopo Mondi,
	Dave Stevenson

Hi

On Fri, Sep 9, 2022 at 1:58 PM Maxime Ripard <maxime@cerno.tech> wrote:
>
> On Fri, Sep 09, 2022 at 12:00:34PM +0200, Ricardo Ribalda wrote:
> > On Fri, 9 Sept 2022 at 11:06, Maxime Ripard <maxime@cerno.tech> wrote:
> > > On Fri, Sep 09, 2022 at 10:39:36AM +0200, Ricardo Ribalda wrote:
> > > > On Fri, 9 Sept 2022 at 10:00, Maxime Ripard <maxime@cerno.tech> wrote:
> > > > >
> > > > > On Thu, Sep 08, 2022 at 08:13:17PM +0200, Ricardo Ribalda wrote:
> > > > > > On Thu, 8 Sept 2022 at 17:34, Maxime Ripard <maxime@cerno.tech> wrote:
> > > > > > >
> > > > > > > On Thu, Sep 08, 2022 at 06:16:40PM +0300, Laurent Pinchart wrote:
> > > > > > > > On Thu, Sep 08, 2022 at 04:59:05PM +0200, Maxime Ripard wrote:
> > > > > > > > > On Thu, Sep 08, 2022 at 05:14:41PM +0300, Laurent Pinchart wrote:
> > > > > > > > > > On Thu, Sep 08, 2022 at 10:08:46AM +0200, Maxime Ripard wrote:
> > > > > > > > > > > Hi Ricardo,
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 08, 2022 at 09:11:11AM +0200, Ricardo Ribalda wrote:
> > > > > > > > > > > > > - Still on slide 16, V4L2 as an API is usable without disclosing vendor
> > > > > > > > > > > > >   IP. What is not possible is upstreaming a driver. I don't see this as
> > > > > > > > > > > > >   significantly different between V4L2 and the new API proposal. I
> > > > > > > > > > > > >   expect this to be discussed on Monday.
> > > > > > > > > > > >
> > > > > > > > > > > > I am only considering upstream drivers. There is not much to discuss
> > > > > > > > > > > > for downstream or closed drivers :)
> > > > > > > > > > >
> > > > > > > > > > > Are we really discussing upstream *drivers*? If anything, it looks like
> > > > > > > > > > > the Kcam proposal moves most of the drivers out of upstream.
> > > > > > > > > >
> > > > > > > > > > Given that the API proposal sets at a significant lower level than V4L2
> > > > > > > > > > in the stack, the concept of "userspace driver" (I meant it in the sense
> > > > > > > > > > of GPU support in mesa) plays a bigger role. It would be good to clarify
> > > > > > > > > > what is meant by "driver" and maybe use the term "kernel driver" when
> > > > > > > > > > only the kernel part is covered, to avoid misunderstandings.
> > > > > > > > >
> > > > > > > > > I think there's a bit of a misunderstanding about what exactly is in a
> > > > > > > > > DRM driver, and what is in Mesa.
> > > > > > > > >
> > > > > > > > > Mesa doesn't program the hardware at all, it's merely a glorified
> > > > > > > > > compiler. It's not more of a driver than GCC is an OS. Most importantly
> > > > > > > > > for our discussion, Mesa doesn't perform any kind of register access (or
> > > > > > > > > register access request), only the (kernel) driver does that.
> > > > > > > >
> > > > > > > > Mesa compiles shaders, but also more generally produces command streams
> > > > > > > > that are passed as blobs to the DRM driver, which then forwards them to
> > > > > > > > the device with as little processing and validation as possible (when
> > > > > > > > the device is designed with multi-clients in mind, that processing and
> > > > > > > > validation can be reduced a lot).
> > > > > > >
> > > > > > > That's true, but at no point in time is the CPU ever touches that
> > > > > > > command stream blob in the case of DRM...
> > > > > >
> > > > > > As Laurent says, the latest hardware is very similar to GPUs, you pass
> > > > > > a set of commands to a firmware that does the actual R/W to the
> > > > > > hardware.
> > > > >
> > > > > For the latest, most powerful, hardware, maybe. I can show you plenty of
> > > > > other ISP we'll need to support that aren't programmed that way, and in
> > > > > that case we would end up interpreting whatever is being passed to KCam
> > > > > on the CPU.
> > > >
> > > > Kcam is not meant to replace V4L2, if a hardware is better modeled in
> > > > V4L2, they can use it.
> > >
> > > I'm not sure that alone is going to fly. Having to support the same
> > > device in multiple frameworks based on who is using it exactly is very
> > > frowned upon: it's a waste of development, review and maintenance time.
> > >
> > > If we aim for something, it's to supersede v4l2, or to extend v4l2.
> >
> > You do not support the same device in multiple frameworks. You pick
> > the driver that fits your purpose better.
> >
> > Like Industrial IO vs other subsystems.
>
> Again, it's not clear to me where do we draw the line then. Let's say we
> have three different systems, one with a DMA and an ov5640, one with a
> DMA, an ISP and an ov5640, and let's be a bit exotic and re-use the
> first setup, with an external ISP in the mix.
>
> Which setup is going to use which framework, and where the drivers are
> going to be?
>
>
> > > > > > The most critical part is the DMA, and that will always be abstracted.
> > > > >
> > > > > Where do you draw the line then? What will have a driver in the kernel,
> > > > > and what won't?
> > > >
> > > > If there is memory access: abstraction
> > > > If the hardware is not trusted/documented:abstraction
> > > > If a specific configuration is know to be invalid and leaves the
> > > > system in an invalid state:filtering
> > > > everything else: raw access (+validation)
> > >
> > > I mean, the ISP you mentioned at least has to access the command buffer
> > > somehow, and surely that counts as a memory access?
> > >
> > > And so, what happens if the ISP is not entirely documented? You start
> > > with a kernel driver, and then once it is documented you move it to Kcam
> > > breaking all users in the process?
> >
> > I think there is a misunderstanding here, All the hardware needs a
> > kernel driver.
> > The functionality of that driver depends on the trust/documentation on
> > the platform:
> >
> > - simple validation
> > - filtering
> > - "modes"
>
> I think you really need to show us with a proof of concept what Kcam
> exactly is then. Because in your slide 22, you state:
>
>
> """
> Kcam follows a DRM-like model where the
> kernel provides basic functionality:
> - Scheduling
> - Discovery
>
> Everything else is provided by an userspace library (hopefully libcamera)
> """
>
> What is "everything else" in that case ?

One thing is the kcam framework and the other is the kcam drivers

kcam framework:
  - Scheduling and discovery

Kcam drivers:
  - Access to hardware via whatever abstraction level, power management,



>
> >
> > >
> > > > > And again, the issue I was telling you about was about a configuration
> > > > > mismatch (following a bogus documentation) between the DMA and the
> > > > > sensor. If the sensor is part of the userspace and the DMA in the
> > > > > kernel, we very much can still have that issue.
> > > >
> > > > With internal operations you can achieve cooperation between the entities.
> > >
> > > Again, looks like what we currenty have with v4l2 to me.
> >
> > In v4l2, the API is inside the kernel.  Kcam  is API agnostic, there
> > is no concept of streams, formats, controls...
>
> With the "modes abstraction" you mentioned before, we're getting really
> close to a concept of controls and formats.
>
> > > > > > Also I doubt that we will have new hardware without an IOMMU, so we
> > > > > > have the same layers of security as today.
> > > > >
> > > > > Maybe not for the kind of devices that end up on chromebooks, but
> > > > > there's definitely hardware being designed today that have an ISP but no
> > > > > IOMMU.
> > > >
> > > > For the non-iommu hardware, you will have the same security as today:
> > > > driver validation.
> > >
> > > I mean, I'm all for it. But the stated goal of Kcam is to reduce the
> > > driver logic so that most of it is in userspace, but most of your
> > > answers to challenges so far has been "but we'll have a driver for that"
> > >
> > > If a driver is the solution, why do we need the Kcam architecture in the
> > > first place?
> >
> > To create a platform where you can build any API.
>
> This answers what you want to achieve at the user-space level, but not
> really why v4l2 doesn't fit the bill. Libcamera already provides support
> for multiple API on top of v4l2

Of course you can try to build an API on top of another API, but that
comes at a cost, and requires retrofitting extra functionality.
Two examples; Look at the request API and how many drivers out of
staging are using it. Look at the number of android devices that are
using v4l2 to implement HAL3.


>
> > > > > > > > Recent ISPs have a similar architecture, with a set of registers used
> > > > > > > > to communicate with the ISP firmware, and then most of the hardware
> > > > > > > > registers for the actual image processing blocks being programmed
> > > > > > > > based from the command stream. "Command stream" may not be a very good
> > > > > > > > term for ISPs as it's not really a stream of commands, but
> > > > > > > > conceptually, we're dealing with a blob that is computed by userspace.
> > > > > > >
> > > > > > > ... while in Kcam, the CPU knows and will interpret that command stream.
> > > > > > > Maybe not in all cases, but it's still a significant difference.
> > > > > > >
> > > > > > > If we had to draw a parallel with something else in the kernel, it looks
> > > > > > > way more like eBPF or the discussion we had on where to parse the
> > > > > > > bitstreams for stateless codecs.
> > > > > > >
> > > > > > > The first one has been severely constrained to avoid the issues we've
> > > > > > > raised, and we all know how the second one went.
> > > > > >
> > > > > > In eBPF, you are moving some user code to the kernel, with an unstable API.
> > > > > >
> > > > > > In KCAM, (and in DRM), you let the user build a set of operations,
> > > > > > that you pass to the kernel via a stable API, then it is validated and
> > > > > > scheduled by the kernel.
> > > > >
> > > > > You won't be able to have a stable API with that design either. If only
> > > > > because of that whitelist you were mentioning. Let's say we have a
> > > > > register that turns out, after the facts, to not be available. If the
> > > > > userspace ever used to set it at some point, you're screwed. Indeed,
> > > > > either you move it out of the whitelist, and then you break userspace,
> > > > > or you don't add it to the whitelist and end up allowing an insecure or
> > > > > dangerous situation.
> > > >
> > > > See above for our description of allowlist.
> > > >
> > > > Also, using the drm model as reference. kernel version, libdrm and
> > > > mesa (and even llvm) are very coupled. Using a wrong version can lead
> > > > to unexpected results or even GPU hangs.
> > >
> > > Right.
> > >
> > > And those are considered regressions:
> > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
> > >
> > > """
> > > The Linux kernel’s “no regression” policy holds in practice only for
> > > open-source userspace of the DRM subsystem. DRM developers are perfectly
> > > fine if closed-source blob drivers in userspace use the same uAPI as the
> > > open drivers, but they must do so in the exact same way as the open
> > > drivers. Creative (ab)use of the interfaces will, and in the past
> > > routinely has, lead to breakage.
> > > """
> >
> > There are a lot of hypothetical scenarios: erratas, big documentation
> > drop from a vendor, drivers obtained via reverse engineering....
>
> Those wouldn't be regressions.
>
> > There is no one-size-fits-all solution, and every case needs to be
> > considered individually when they come.
> >
> > As any other kernel subsystems we need to live under the
> > never-break-the-userspace. Obey gravity, it is the law ;)
> >
> >
> > >
> > > > What to do when we fix bugs that affect functionality is something
> > > > that we need to decide on case to case cases. The same way we do today
> > > > when hardware does not support a control value and we discover it 10
> > > > versions later.
> > >
> > > Indeed, but we need to have some idea on what that process is going to
> > > look like in practice. If we put ourselves is a corner and don't allow
> > > for some bug resolutions in the first place, then we won't be able to
> > > fix them when we'll encounter them.
> >
> > We can definately look into establishing some guidelines once we have
> > a more clear vision of kcam and its extent.
> >
> >
> > >
> > > > > And you can't say you would just ignore a register that isn't part of
> > > > > the whitelist, because then you would enforce a configuration that isn't
> > > > > the one the user-space asked for, which is even worse.
> > > > >
> > > > > > X11 was much more bizarre, the GPIO iomem was remapped into userspace.
> > > > >
> > > > > Yes, but that wasn't the only thing bad with it. I mean, it doesn't
> > > > > really matter who exactly does the register access eventually. In UMS,
> > > > > X11 was doing it itself through a mapping of its own, in KCam the kernel
> > > > > will do it on behalf of the userspace. But we still end up in both cases
> > > > > with:
> > > > >
> > > > >   * The entire logic is in userspace
> > > >
> > > > We can argue if this is an issue or not. I think it is not
> > > >
> > > > >   * Realistically speaking, that logic can only run as root
> > > >
> > > > Do not agree.
> > >
> > > How so? Are you going to allow any javascript enabled website to poke
> > > into Kcam?
> >
> > Can you access the GPU ISA directly as an external app on the browser?
>
> I guess?

Please change browser if yours has a javascript engine with access to
/dev/dri :)

>
> > Same answer here.
>
> Not really, and I'm sorry but you keep eluding this, and this comes back
> in pretty much every mail: the CPU will never interpret the GPU ISA in
> kernel space. Kcam might. This is a very big difference.
>
> > The browser will typically speak with a system service to access the
> > camera: pipewire, gstreamer....
> >
> > >
> > > > >   * With a poor configuration, the userspace can completely crash the
> > > > >     system
> > > > >   * If the userspace crashes, you can end up with a configuration you
> > > > >     can't really recover from
> > > >
> > > > A Kcam driver can give you broken images, but never crash the system
> > > > or leave it in an unrecoverable state. That is the main guarantee that
> > > > we expect from the drivers.
> > >
> > > That's wishful thinking. If your application crashes halfway through the
> > > configuration, you're left in a weird state you know nothing about now.
> > > That's impossible to recover from.
> >
> > Today (almost) any kernel driver makes no assumption of the initial
> > state of the hardware. You might have been interrupted at any state
> > and rebooting the hardware does not reinit the internal state of the
> > peripherals.
> >
> > This is the same, a kcam app needs to init the hardware before it can
> > make use of it.
>
> I guess it's getting obvious by now that Kcam doesn't seem to be a
> solution that is getting a consensus, possibly because it's so early
> that we don't really have an idea of how it would look like with a real
> device.

Difficult to get a consensus before the summit has even taken place :),

>
> I'm not sure this can be easily solved in the time slot we have on
> Monday, so maybe we can turn this around in order to have some progress:
> could we maybe use that time slot to discuss the problem, and outline
> the attributes an ideal solution would have to make sure everyone is
> roughly on the same page.

This is why the presentation only has 5 slides around the kcam
"internals" and the rest is explaining the problem that ChromeOS (and
the rest of the industry) is facing. :)

We can always fix the code and send 10000 revisions, what we have to
agree on are the principals, openness etc. And that is what we expect
to get out of Monday.

>
> And then, if Kcam can be adapted to change those attributes, awesome :)
>
> Maxime

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-09-09 12:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-07  7:55 [Media Summit] ChromeOS Kernel CAM Ricardo Ribalda
2022-09-07 10:50 ` Laurent Pinchart
2022-09-08  7:11   ` Ricardo Ribalda
2022-09-08  8:08     ` Maxime Ripard
2022-09-08 14:14       ` Laurent Pinchart
2022-09-08 14:59         ` Maxime Ripard
2022-09-08 15:16           ` Laurent Pinchart
2022-09-08 15:34             ` Maxime Ripard
2022-09-08 18:13               ` Ricardo Ribalda
2022-09-08 18:13                 ` Ricardo Ribalda
2022-09-08 19:30                   ` Laurent Pinchart
2022-09-08 20:04                     ` Ricardo Ribalda
2022-09-08 20:59                       ` Laurent Pinchart
2022-09-09  8:00                 ` Maxime Ripard
2022-09-09  8:39                   ` Ricardo Ribalda
2022-09-09  9:06                     ` Maxime Ripard
2022-09-09 10:00                       ` Ricardo Ribalda
2022-09-09 11:58                         ` Maxime Ripard
2022-09-09 12:40                           ` Ricardo Ribalda
2022-09-09  9:11                     ` Laurent Pinchart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox