From: Dan Williams <dan.j.williams@intel.com>
To: Jakub Kicinski <kuba@kernel.org>, David Ahern <dsahern@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
Jonathan Corbet <corbet@lwn.net>,
"Itay Avraham" <itayavr@nvidia.com>,
Leon Romanovsky <leon@kernel.org>, <linux-doc@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <netdev@vger.kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>,
Andy Gospodarek <andrew.gospodarek@broadcom.com>,
Aron Silverton <aron.silverton@oracle.com>,
Dan Williams <dan.j.williams@intel.com>,
Christoph Hellwig <hch@infradead.org>,
Jiri Pirko <jiri@nvidia.com>, Leonid Bloch <lbloch@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>, <linux-cxl@vger.kernel.org>,
<patches@lists.linux.dev>
Subject: Re: [PATCH 0/8] Introduce fwctl subystem
Date: Tue, 4 Jun 2024 16:56:57 -0700 [thread overview]
Message-ID: <665fa9c9e69de_4a4e62941e@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20240604070451.79cfb280@kernel.org>
Jakub Kicinski wrote:
[..]
> I don't begrudge anyone building proprietary options, but leave
> upstream out of it.
So I am of 2 minds here. In general, how is upstream benefited by
requiring every vendor command to be wrapped by a Linux command?
Mind you, I am coming at this from the perspective of being a maintainer
of a subsystem that does *not* allow unrestricted vendor commands. Since
day one, the CXL subsystem has matched netdev's general sentiment and
been more restrictive than NVMe. It places all vendor commands and even
all yet-to-be-Linux-wrapped-standard-commands behind a
CONFIG_CXL_MEM_RAW_COMMANDS option. That default-off option, when
enabled, allows any command to be sent but it taints the kernel with a
WARN(). CXL devices theoretically allow direct manipulation of system
memory without IOMMU protection which is in contrast to NVMe which would
need to work harder to violate kernel-lockdown protections.
The expectation that I laid out here [1] is based on the observation
that a significant portion of the vendor commands these devices support
are for pre-release hardware qualification and debug flows. The
recommendation to device vendors was "if you need wide distribution of
kernels that allow unrestricted vendor passthrough, work with Linux
distributions to enable this option in debug kernels, run those debug
kernels for your pre-release hardware flows, ignore the warnings".
3 years on from that recommendation it seems no vendor has even needed
that level of distribution help. I.e. checking a few distro kernels
(Fedora, openSUSE) shows no uptake for CONFIG_CXL_MEM_RAW_COMMANDS=y in
their debug builds. I can only assume that locally compiled custom
kernel binaries are filling the need.
So all seems quiet with current restriction for CXL endpoint vendor
commands, but this stance was recently challenged in this thread [2] by
CXL switch vendors with an assertion that fabric switch configuration
has need for more and varied vendor flows than endpoint configuration.
While I am not clear on the veracity of that claim, it at least
challenged me to do the thought experiment of "what would it look like
to relax the CXL command restriction?". Maybe we can come up with a
community answer to the "so you want to build a
userpace-to-device-firmware tunnel?" to at least get all the various
concerns documented in one place, and provide guidance for how device
vendors should navigate this space across subsystems. Between NVMe
"allow all the things", CXL "allow all the things only after tainting
the kernel", and the "never allow vendor passthrough" position (I am
sure there are other nuanced positions) it at least seems useful to
document the concerns. Here is a start for that guidance from the CXL
perspective:
* Integrity: Subsystem has a responsibility to meet kernel-lockdown
expectations:
Distros and system owners need to be assured that root's ability to
modify the running kernel image are mitigated. For CXL there are 2 ways
to do this, require Linux wrapper commands for all the low level
commands (status quo), or a new trust the device to publish which
commands have user data effects in something CXL calls the "Command
Effects Log". In that "trust Command Effects" scenario the kernel still
has no idea what the command is actually doing, but it can at least
assert that the device does not claim that the command changes the
contents of system-memory. Now, you might say, "the device can just
lie", but that betrays a conceit of the kernel restriction. A device
could lie that a Linux wrapped command when passed certain payloads does
not in turn proxy to a restricted command. So at some point there is
almost always an out-of-tree way to get around the kernel restriction,
so the question is are we better off giving a blessed path or force
vendors into ugly out-of-tree workarounds?
* Introspection / validation: Subsystem community needs to be able to
audit behavior after the fact.
To me this means even if the kernel is letting a command through based
on the stated Command Effect of "Configuration Change after Cold Reset"
upstream community has a need to be able to read the vendor
specification for that command. I.e. commands might be vendor-specific,
but never vendor-private. I see this as similar to the requirement for
open source userspace for sophisticated accelerators.
* Collaboration: open standards support open driver maintenance.
Without standards we end up with awkward situations like Confidential
Computing where every vendor races to implement the same functionality
in arbitrarily different and vendor specific ways.
For CXL devices, and I believe the devices fwctl is targeting, there
are a whole class of commands for vendor specific configuration and
debug. Commands that the kernel really need not worry about.
Some subsystems may want to allow high-performance science experiments
like what NVMe allows, but it seems worth asking the question if
standardizing device configuration and debug is really the best use of
upstream's limited time?
One of the release valves in the CXL space is openly specified
commands with opaque payloads, like "Read Vendor Debug Log". That is
clear what it does, likely a payload the kernel need never worry
about, and the "Command Effects" is empty. However, going forward there
is a new class of commands called "Set/Get Feature" that allow a wide
range of vendor toggles to be deployed which will need an upstream
response for the driver policy to vendor-specific "Features".
So if fwctl, or something like it, can strike a balance of enforcing
integrity and introspection while encouraging collaboration on the
aspects that are worth upstream collaboration, I think that is a
conversation worth having.
[1]: http://lore.kernel.org/r/CAPcyv4gDShAYih5iWabKg_eTHhuHm54vEAei8ZkcmHnPp3B0cw@mail.gmail.com/
[2]: http://lore.kernel.org/r/20240321174423.00007e0d@Huawei.com
next prev parent reply other threads:[~2024-06-04 23:57 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2024-06-04 9:32 ` Leon Romanovsky
2024-06-04 15:50 ` Jason Gunthorpe
2024-06-04 17:05 ` Jonathan Cameron
2024-06-04 18:52 ` Jason Gunthorpe
2024-06-05 11:08 ` Jonathan Cameron
2024-06-04 16:42 ` Randy Dunlap
2024-06-04 16:44 ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2024-06-04 12:16 ` Zhu Yanjun
2024-06-04 12:22 ` Leon Romanovsky
2024-06-04 16:50 ` Jonathan Cameron
2024-06-04 16:58 ` Jason Gunthorpe
2024-06-05 11:07 ` Jonathan Cameron
2024-06-05 18:27 ` Jason Gunthorpe
2024-06-06 13:34 ` Jonathan Cameron
2024-06-06 15:37 ` Randy Dunlap
2024-06-05 15:42 ` Przemek Kitszel
2024-06-05 15:49 ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
2024-06-13 23:32 ` Dave Jiang
2024-06-13 23:40 ` Jason Gunthorpe
2024-06-14 16:37 ` Dave Jiang
2024-06-03 15:53 ` [PATCH 4/8] taint: Add TAINT_FWCTL Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 5/8] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 6/8] fwctl: Add documentation Jason Gunthorpe
2024-06-05 2:31 ` Randy Dunlap
2024-06-05 16:03 ` Jason Gunthorpe
2024-06-05 20:14 ` Randy Dunlap
2024-06-03 15:53 ` [PATCH 7/8] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 8/8] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
2024-06-03 18:42 ` [PATCH 0/8] Introduce fwctl subystem Jakub Kicinski
2024-06-04 3:01 ` David Ahern
2024-06-04 14:04 ` Jakub Kicinski
2024-06-04 21:28 ` Saeed Mahameed
2024-06-04 22:32 ` Jakub Kicinski
2024-06-05 14:50 ` Jason Gunthorpe
2024-06-05 15:41 ` Jakub Kicinski
2024-06-04 23:56 ` Dan Williams [this message]
2024-06-05 3:05 ` Jakub Kicinski
2024-06-05 11:19 ` Jonathan Cameron
2024-06-05 13:59 ` Jason Gunthorpe
2024-06-06 2:35 ` David Ahern
2024-06-06 14:18 ` Jakub Kicinski
2024-06-06 14:48 ` Jason Gunthorpe
2024-06-06 15:05 ` Jakub Kicinski
2024-06-06 17:47 ` David Ahern
2024-06-07 6:48 ` Jiri Pirko
2024-06-07 14:50 ` David Ahern
2024-06-07 15:14 ` Jason Gunthorpe
2024-06-07 15:50 ` Jiri Pirko
2024-06-07 17:24 ` Jason Gunthorpe
2024-06-07 7:34 ` Jiri Pirko
2024-06-07 12:49 ` Andrew Lunn
2024-06-07 13:34 ` Jiri Pirko
2024-06-08 1:43 ` Jakub Kicinski
2024-06-06 4:56 ` Dan Williams
2024-06-06 8:50 ` Leon Romanovsky
2024-06-06 22:11 ` Dan Williams
2024-06-07 0:02 ` Jason Gunthorpe
2024-06-07 13:12 ` Leon Romanovsky
2024-06-06 14:41 ` Jason Gunthorpe
2024-06-06 14:58 ` Jakub Kicinski
2024-06-06 17:24 ` Dan Williams
2024-06-07 0:25 ` Jason Gunthorpe
2024-06-07 10:47 ` Przemek Kitszel
2024-06-11 15:36 ` Daniel Vetter
2024-06-11 16:17 ` Jason Gunthorpe
2024-06-11 16:54 ` Daniel Vetter
2024-06-06 1:58 ` David Ahern
2024-06-05 3:11 ` Jakub Kicinski
2024-06-05 12:06 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=665fa9c9e69de_4a4e62941e@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=andrew.gospodarek@broadcom.com \
--cc=aron.silverton@oracle.com \
--cc=corbet@lwn.net \
--cc=dsahern@kernel.org \
--cc=hch@infradead.org \
--cc=itayavr@nvidia.com \
--cc=jgg@nvidia.com \
--cc=jiri@nvidia.com \
--cc=kuba@kernel.org \
--cc=lbloch@nvidia.com \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=patches@lists.linux.dev \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox