From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Riana Tauro <riana.tauro@intel.com>
Cc: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>,
Jakub Kicinski <kuba@kernel.org>,
<intel-xe@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>,
<aravind.iddamsetty@linux.intel.com>, <anshuman.gupta@intel.com>,
<joonas.lahtinen@linux.intel.com>, <lukas@wunner.de>,
<simona.vetter@ffwll.ch>, <airlied@gmail.com>,
<pratik.bari@intel.com>, <joshua.santosh.ranjan@intel.com>,
<ashwin.kumar.kulkarni@intel.com>, <shubham.kumar@intel.com>,
Lijo Lazar <lijo.lazar@amd.com>,
Hawking Zhang <Hawking.Zhang@amd.com>,
"David S. Miller" <davem@davemloft.net>,
Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>, <netdev@vger.kernel.org>,
Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Subject: Re: [PATCH v3 1/4] drm/ras: Introduce the DRM RAS infrastructure over generic netlink
Date: Fri, 16 Jan 2026 15:26:38 -0500 [thread overview]
Message-ID: <aWqe_ulADqRZBQj_@intel.com> (raw)
In-Reply-To: <3297a59b-a788-43aa-945d-e89592c9ba8d@intel.com>
On Fri, Jan 16, 2026 at 11:26:36AM +0530, Riana Tauro wrote:
>
>
> On 1/16/2026 5:09 AM, Zack McKevitt wrote:
> >
> >
> > On 1/13/2026 1:20 AM, Riana Tauro wrote:
> > > > > > > diff --git
> > > > > > > a/Documentation/netlink/specs/drm_ras.yaml b/
> > > > > > > Documentation/netlink/specs/drm_ras.yaml
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..be0e379c5bc9
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/netlink/specs/drm_ras.yaml
> > > > > > > @@ -0,0 +1,130 @@
> > > > > > > +# SPDX-License-Identifier: ((GPL-2.0 WITH
> > > > > > > Linux-syscall-note) OR BSD-3-Clause)
> > > > > > > +---
> > > > > > > +name: drm-ras
> > > > > > > +protocol: genetlink
> > > > > > > +uapi-header: drm/drm_ras.h
> > > > > > > +
> > > > > > > +doc: >-
> > > > > > > + DRM RAS (Reliability, Availability,
> > > > > > > Serviceability) over Generic Netlink.
> > > > > > > + Provides a standardized mechanism for DRM drivers
> > > > > > > to register "nodes"
> > > > > > > + representing hardware/software components capable
> > > > > > > of reporting error counters.
> > > > > > > + Userspace tools can query the list of nodes or
> > > > > > > individual error counters
> > > > > > > + via the Generic Netlink interface.
> > > > > > > +
> > > > > > > +definitions:
> > > > > > > + -
> > > > > > > + type: enum
> > > > > > > + name: node-type
> > > > > > > + value-start: 1
> > > > > > > + entries: [error-counter]
> > > > > > > + doc: >-
> > > > > > > + Type of the node. Currently, only error-counter nodes are
> > > > > > > + supported, which expose reliability
> > > > > > > counters for a hardware/software
> > > > > > > + component.
> > > > > > > +
> > > > > > > +attribute-sets:
> > > > > > > + -
> > > > > > > + name: node-attrs
> > > > > > > + attributes:
> > > > > > > + -
> > > > > > > + name: node-id
> > > > > > > + type: u32
> > > > > > > + doc: >-
> > > > > > > + Unique identifier for the node.
> > > > > > > + Assigned dynamically by the DRM RAS
> > > > > > > core upon registration.
> > > > > > > + -
> > > > > > > + name: device-name
> > > > > > > + type: string
> > > > > > > + doc: >-
> > > > > > > + Device name chosen by the driver at registration.
> > > > > > > + Can be a PCI BDF, UUID, or module name if unique.
> > > > > > > + -
> > > > > > > + name: node-name
> > > > > > > + type: string
> > > > > > > + doc: >-
> > > > > > > + Node name chosen by the driver at registration.
> > > > > > > + Can be an IP block name, or any name
> > > > > > > that identifies the
> > > > > > > + RAS node inside the device.
> > > > > > > + -
> > > > > > > + name: node-type
> > > > > > > + type: u32
> > > > > > > + doc: Type of this node, identifying its function.
> > > > > > > + enum: node-type
> > > > > > > + -
> > > > > > > + name: error-counter-attrs
> > > > > > > + attributes:
> > > > > > > + -
> > > > > > > + name: node-id
> > > > > > > + type: u32
> > > > > > > + doc: Node ID targeted by this error counter operation.
> > > > > > > + -
> > > > > > > + name: error-id
> > > > > > > + type: u32
> > > > > > > + doc: Unique identifier for a specific error
> > > > > > > counter within an node.
> > > > > > > + -
> > > > > > > + name: error-name
> > > > > > > + type: string
> > > > > > > + doc: Name of the error.
> > > > > > > + -
> > > > > > > + name: error-value
> > > > > > > + type: u32
> > > > > > > + doc: Current value of the requested error counter.
> > > > > > > +
> > > > > > > +operations:
> > > > > > > + list:
> > > > > > > + -
> > > > > > > + name: list-nodes
> > > > > > > + doc: >-
> > > > > > > + Retrieve the full list of currently
> > > > > > > registered DRM RAS nodes.
> > > > > > > + Each node includes its dynamically
> > > > > > > assigned ID, name, and type.
> > > > > > > + **Important:** User space must call this
> > > > > > > operation first to obtain
> > > > > > > + the node IDs. These IDs are required for all subsequent
> > > > > > > + operations on nodes, such as querying error counters.
> > > > >
> > > > > I am curious about security implications of this design.
> > > >
> > > > hmm... very good point you are raising here.
> > > >
> > > > This current design relies entirely in the CAP_NET_ADMIN.
> > > > No driver would have the flexibility to choose anything differently.
> > > > Please notice that the flag admin-perm is hardcoded in this yaml file.
> > > >
> > > > > If the complete
> > > > > list of RAS nodes is visible for any process on the system
> > > > > (and one wants to
> > > > > avoid requiring CAP_NET_ADMIN), there should be some way to enforce
> > > > > permission checks when performing these operations if desired.
> > > >
> > > > Right now, there's no way that the driver would choose not avoid
> > > > requiring
> > > > CAP_NET_ADMIN...
> > > >
> > > > Only way would be the admin to give the cap_net_admin to the tool with:
> > > >
> > > > $ sudo setcap cap_net_admin+ep /bin/drm_ras_tool
> > > >
> > > > but not ideal and not granular anyway...
> > > >
> > > > >
> > > > > For example, this might be implemented in the driver's definition of
> > > > > callback functions like query_error_counter; some drivers
> > > > > may want to ensure
> > > > > that the process can in fact open the file descriptor
> > > > > corresponding to the
> > > > > queried device before serving a netlink request. Is it
> > > > > enough for a driver
> > > > > to simply return -EPERM in this case? Any driver that doesnt
> > > > > wish to protect
> > > > > its RAS nodes need not implement checks in their callbacks.
> > > >
> > > > Fair enough. If we want to give the option to the drivers, then we need:
> > > >
> > > > 1. to first remove all the admin-perm flags below and leave the
> > > > driver to
> > > > pick up their policy on when to return something or -EPERM.
> > > > 2. Document this security responsibility and list a few possibilities.
> > > > 3. In our Xe case here I believe the easiest option is to use
> > > > something like:
> > > >
> > > > struct scm_creds *creds = NETLINK_CREDS(cb->skb);
> > > > if (!gid_eq(creds->gid, GLOBAL_ROOT_GID))
> > > > return -EPERM
> > >
> > > The driver currently does not have access to the callback or the
> > > skbuffer. Sending these details as param to driver won't be right as
> > > drm_ras needs to handle all the netlink buffers.
> > >
> > > How about using pre_doit & start calls? If driver has a pre callback
> > > , it's the responsibility of the driver to check permissions/any-pre
> > > conditions, else the CAP_NET_ADMIN permission will be checked.
> > >
> > > @Zack / @Rodrigo thoughts?
> > > @Zack Will this work for your usecase?
> > >
> > > yaml
> > > + dump:
> > > + pre: drm-ras-nl-pre-list-nodes
> > >
> > >
> > > drm_ras.c :
> > >
> > > + if (node->pre_list_nodes)
> > > + return node->pre_list_nodes(node);
> > > +
> > > + return check_permissions(cb->skb); //Checks creds
> > >
> > > Thanks
> > > Riana
> > >
> >
> > I agree that a pre_doit is probably the best solution for this.
> >
> > Not entirely sure what a driver specific implementation would look like
> > yet, but I think that as long as the driver callback has a way to access
> > the 'current' task_struct pointer corresponding to the user process then
> > this approach seems like the best option from the netlink side.
> >
> > Since this is mostly a concern for our specific use case, perhaps we can
> > incorporate this functionality in our change down the road when we
> > expand the interface for telemetry?
Yes, as it can be changed transparently, let's do that...
>
>
> Yeah using pre_doit we can allow driver to make decisions based on
> the private data or driver specific permissions. However we will need to
> check the CAP_NET_ADMIN when no driver callback is implemented in the
> netlink layer as a default .
>
> Thank you, you can incorporate the changes when you add telemetry nodes.
>
> For now, I will retain the admin-perm in flags.
Cool then, when they come with their case we remove it and force in the
pre_doit as well.
ack..
>
> I will address the rest of the review comments and send out a new revision
> shortly.
>
> Thank you
> Riana
>
>
> >
> > Let me know what you think.
> >
> > Zack
> >
> > > >
> > > > or something like that?!
> > > >
> > > > perhaps drivers could implement some form of cookie or pre-
> > > > authorization with
> > > > ioctls or sysfs, and then store in the priv?
> > > >
> > > > Thoughts?
> > > > Other options?
> > > >
> > > > >
> > > > > I dont see any such permissions checks in your driver
> > > > > implementation which
> > > > > is understandable given that it may not be necessary for
> > > > > your use cases.
> > > > > However, this would be a concern for our driver if we were
> > > > > to adopt this
> > > > > interface.
> > > >
> > > > yeap, this case was entirely with admin-perm, so not needed at all...
> > > > But I see your point and this is really not giving any flexibility to
> > > > other drivers.
> > > >
> >
> > > > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Zack
> > > > >
> > >
> >
>
next prev parent reply other threads:[~2026-01-16 20:26 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-05 8:39 [PATCH v3 0/4] Introduce DRM_RAS using generic netlink for RAS Riana Tauro
2025-12-05 8:39 ` [PATCH v3 1/4] drm/ras: Introduce the DRM RAS infrastructure over generic netlink Riana Tauro
2025-12-09 21:35 ` Rodrigo Vivi
2026-01-08 22:36 ` Zack McKevitt
2026-01-09 20:57 ` Rodrigo Vivi
2026-01-13 8:20 ` Riana Tauro
2026-01-15 23:39 ` Zack McKevitt
2026-01-16 5:56 ` Riana Tauro
2026-01-16 20:26 ` Rodrigo Vivi [this message]
2025-12-05 8:39 ` [PATCH v3 2/4] drm/xe/xe_drm_ras: Add support for drm ras Riana Tauro
2025-12-09 8:22 ` Raag Jadav
2026-01-09 8:08 ` Riana Tauro
2026-01-09 14:13 ` Rodrigo Vivi
2026-01-09 15:58 ` Raag Jadav
2026-01-12 6:13 ` Riana Tauro
2026-01-12 10:27 ` Raag Jadav
2025-12-09 21:57 ` Rodrigo Vivi
2026-01-07 9:48 ` Aravind Iddamsetty
2025-12-05 8:39 ` [PATCH v3 3/4] drm/xe/xe_hw_error: Add support for GT hardware errors Riana Tauro
2025-12-10 18:18 ` Raag Jadav
2026-01-12 3:41 ` Riana Tauro
2026-01-12 10:02 ` Raag Jadav
2025-12-05 8:39 ` [PATCH v3 4/4] drm/xe/xe_hw_error: Add support for PVC SOC errors Riana Tauro
2025-12-15 10:52 ` Raag Jadav
2026-01-12 4:45 ` Riana Tauro
2026-01-12 10:06 ` Raag Jadav
2025-12-05 9:40 ` ✗ CI.checkpatch: warning for Introduce DRM_RAS using generic netlink for RAS (rev3) Patchwork
2025-12-05 9:41 ` ✓ CI.KUnit: success " Patchwork
2025-12-05 9:56 ` ✗ CI.checksparse: warning " Patchwork
2025-12-05 11:27 ` ✗ Xe.CI.Full: failure " Patchwork
2025-12-09 21:56 ` [PATCH v3 0/4] Introduce DRM_RAS using generic netlink for RAS Alex Deucher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWqe_ulADqRZBQj_@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=Hawking.Zhang@amd.com \
--cc=airlied@gmail.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=ashwin.kumar.kulkarni@intel.com \
--cc=davem@davemloft.net \
--cc=dri-devel@lists.freedesktop.org \
--cc=edumazet@google.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jeff.hugo@oss.qualcomm.com \
--cc=joonas.lahtinen@linux.intel.com \
--cc=joshua.santosh.ranjan@intel.com \
--cc=kuba@kernel.org \
--cc=lijo.lazar@amd.com \
--cc=lukas@wunner.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pratik.bari@intel.com \
--cc=riana.tauro@intel.com \
--cc=shubham.kumar@intel.com \
--cc=simona.vetter@ffwll.ch \
--cc=zachary.mckevitt@oss.qualcomm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.