netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: David Ahern <dsahern@kernel.org>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"Itay Avraham" <itayavr@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, <linux-doc@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, <netdev@vger.kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>,
	Andy Gospodarek <andrew.gospodarek@broadcom.com>,
	Aron Silverton <aron.silverton@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	Jiri Pirko <jiri@nvidia.com>, Leonid Bloch <lbloch@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>, <linux-cxl@vger.kernel.org>,
	<patches@lists.linux.dev>
Subject: Re: [PATCH 0/8] Introduce fwctl subystem
Date: Tue, 4 Jun 2024 20:05:56 -0700	[thread overview]
Message-ID: <20240604200556.398afd07@kernel.org> (raw)
In-Reply-To: <665fa9c9e69de_4a4e62941e@dwillia2-xfh.jf.intel.com.notmuch>

On Tue, 4 Jun 2024 16:56:57 -0700 Dan Williams wrote:
> Jakub Kicinski wrote:
> [..]
> > I don't begrudge anyone building proprietary options, but leave
> > upstream out of it.  
> 
> So I am of 2 minds here. In general, how is upstream benefited by
> requiring every vendor command to be wrapped by a Linux command?
> [...]

Thanks for sharing the CXL experience and your perspective.
Also for trying to frame the discussion in a useful way,
although I have little faith that it will help :( Fingers crossed?

> * Integrity: Subsystem has a responsibility to meet kernel-lockdown
>   expectations:
> 
>   Distros and system owners need to be assured that root's ability to
>   modify the running kernel image are mitigated. For CXL there are 2 ways
>   to do this, require Linux wrapper commands for all the low level
>   commands (status quo), or a new trust the device to publish which
>   commands have user data effects in something CXL calls the "Command
>   Effects Log". In that "trust Command Effects" scenario the kernel still
>   has no idea what the command is actually doing, but it can at least
>   assert that the device does not claim that the command changes the
>   contents of system-memory. Now, you might say, "the device can just
>   lie", but that betrays a conceit of the kernel restriction. A device
>   could lie that a Linux wrapped command when passed certain payloads does
>   not in turn proxy to a restricted command. So at some point there is
>   almost always an out-of-tree way to get around the kernel restriction,
>   so the question is are we better off giving a blessed path or force
>   vendors into ugly out-of-tree workarounds?

The integrity thing is a double edge sword, so I don't have much to say
here. If we take a few wrong turns we'll wrap the vendor commands with
crypto and then the vendor can control which commands you get to run ;)
Obviously I'm joking, and not saying that the intent of the current
series! But its about as realistic as "this will only be used for truly
vendor specific things".

> * Introspection / validation: Subsystem community needs to be able to
>   audit behavior after the fact.
> 
>   To me this means even if the kernel is letting a command through based
>   on the stated Command Effect of "Configuration Change after Cold Reset"
>   upstream community has a need to be able to read the vendor
>   specification for that command. I.e. commands might be vendor-specific,
>   but never vendor-private. I see this as similar to the requirement for
>   open source userspace for sophisticated accelerators.

That sounds pretty CXL specific, and IIUC unrealistic.
You assume you have some specification to consult, while this discussion
has been going for over a year now, and I can't get the vendors to share
what those turntables they so desperately need to tweak are.

> * Collaboration: open standards support open driver maintenance.
> 
>   Without standards we end up with awkward situations like Confidential
>   Computing where every vendor races to implement the same functionality
>   in arbitrarily different and vendor specific ways.
> 
>   For CXL devices, and I believe the devices fwctl is targeting, there
>   are a whole class of commands for vendor specific configuration and
>   debug. Commands that the kernel really need not worry about.
> 
>   Some subsystems may want to allow high-performance science experiments
>   like what NVMe allows, but it seems worth asking the question if
>   standardizing device configuration and debug is really the best use of
>   upstream's limited time?

No, but it's not about science experiments, really. It's about
production features. The effort of implementing something properly
upstream is high. I cost time and money to get the right caliber of
people and let them go thru the revisions. I lack confidence that
merging fwctl will not negatively impact motivation for companies to
pay off our accrued technical debt. While all they need is "this simple
little feature". And before competition wins the customer. It's a race
to the bottom.

>   One of the release valves in the CXL space is openly specified
>   commands with opaque payloads, like "Read Vendor Debug Log". That is
>   clear what it does, likely a payload the kernel need never worry
>   about, and the "Command Effects" is empty. However, going forward there
>   is a new class of commands called "Set/Get Feature" that allow a wide
>   range of vendor toggles to be deployed which will need an upstream
>   response for the driver policy to vendor-specific "Features".
>
> So if fwctl, or something like it, can strike a balance of enforcing
> integrity and introspection while encouraging collaboration on the
> aspects that are worth upstream collaboration, I think that is a
> conversation worth having.

I presume you were trying to underscore that the decision is unavoidably
a trade off, which is true. But I don't follow the exact formulation.
Is fwctl helping integrity or collaboration? If we assume use of vendor
tools is unavoidable, then I guess integrity? I really can't see how it
helps collaboration when everyone ships their custom tool set.

Back to the tradeoff. For networking, which is a _very_ mature subsystem
with a ton of standards the need to do "vendor specific things" is
marginal. The downside of the loss of an "upstream advantage" is obvious.
We need to take such decisions on subsystem by subsystem basis.
You should be able to draw the lines differently for CXL than how we
draw them for TCP/IP.

On the technical level the discussion can't go very far, because I'd
like to hear actual user problems. But I can't even get a list of those
infamous thousands of knobs :|

  reply	other threads:[~2024-06-05  3:05 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-03 15:53 [PATCH 0/8] Introduce fwctl subystem Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 1/8] fwctl: Add basic structure for a class subsystem with a cdev Jason Gunthorpe
2024-06-04  9:32   ` Leon Romanovsky
2024-06-04 15:50     ` Jason Gunthorpe
2024-06-04 17:05       ` Jonathan Cameron
2024-06-04 18:52         ` Jason Gunthorpe
2024-06-05 11:08           ` Jonathan Cameron
2024-06-04 16:42   ` Randy Dunlap
2024-06-04 16:44     ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 2/8] fwctl: Basic ioctl dispatch for the character device Jason Gunthorpe
2024-06-04 12:16   ` Zhu Yanjun
2024-06-04 12:22     ` Leon Romanovsky
2024-06-04 16:50       ` Jonathan Cameron
2024-06-04 16:58         ` Jason Gunthorpe
2024-06-05 11:07           ` Jonathan Cameron
2024-06-05 18:27             ` Jason Gunthorpe
2024-06-06 13:34               ` Jonathan Cameron
2024-06-06 15:37                 ` Randy Dunlap
2024-06-05 15:42   ` Przemek Kitszel
2024-06-05 15:49     ` Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 3/8] fwctl: FWCTL_INFO to return basic information about the device Jason Gunthorpe
2024-06-13 23:32   ` Dave Jiang
2024-06-13 23:40     ` Jason Gunthorpe
2024-06-14 16:37       ` Dave Jiang
2024-06-03 15:53 ` [PATCH 4/8] taint: Add TAINT_FWCTL Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 5/8] fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 6/8] fwctl: Add documentation Jason Gunthorpe
2024-06-05  2:31   ` Randy Dunlap
2024-06-05 16:03     ` Jason Gunthorpe
2024-06-05 20:14       ` Randy Dunlap
2024-06-03 15:53 ` [PATCH 7/8] fwctl/mlx5: Support for communicating with mlx5 fw Jason Gunthorpe
2024-06-03 15:53 ` [PATCH 8/8] mlx5: Create an auxiliary device for fwctl_mlx5 Jason Gunthorpe
2024-06-03 18:42 ` [PATCH 0/8] Introduce fwctl subystem Jakub Kicinski
2024-06-04  3:01   ` David Ahern
2024-06-04 14:04     ` Jakub Kicinski
2024-06-04 21:28       ` Saeed Mahameed
2024-06-04 22:32         ` Jakub Kicinski
2024-06-05 14:50           ` Jason Gunthorpe
2024-06-05 15:41             ` Jakub Kicinski
2024-06-04 23:56       ` Dan Williams
2024-06-05  3:05         ` Jakub Kicinski [this message]
2024-06-05 11:19         ` Jonathan Cameron
2024-06-05 13:59         ` Jason Gunthorpe
2024-06-06  2:35           ` David Ahern
2024-06-06 14:18             ` Jakub Kicinski
2024-06-06 14:48               ` Jason Gunthorpe
2024-06-06 15:05                 ` Jakub Kicinski
2024-06-06 17:47                   ` David Ahern
2024-06-07  6:48                     ` Jiri Pirko
2024-06-07 14:50                       ` David Ahern
2024-06-07 15:14                         ` Jason Gunthorpe
2024-06-07 15:50                           ` Jiri Pirko
2024-06-07 17:24                             ` Jason Gunthorpe
2024-06-07  7:34               ` Jiri Pirko
2024-06-07 12:49                 ` Andrew Lunn
2024-06-07 13:34                   ` Jiri Pirko
2024-06-08  1:43                     ` Jakub Kicinski
2024-06-06  4:56           ` Dan Williams
2024-06-06  8:50             ` Leon Romanovsky
2024-06-06 22:11               ` Dan Williams
2024-06-07  0:02                 ` Jason Gunthorpe
2024-06-07 13:12                 ` Leon Romanovsky
2024-06-06 14:41             ` Jason Gunthorpe
2024-06-06 14:58               ` Jakub Kicinski
2024-06-06 17:24               ` Dan Williams
2024-06-07  0:25                 ` Jason Gunthorpe
2024-06-07 10:47                   ` Przemek Kitszel
2024-06-11 15:36           ` Daniel Vetter
2024-06-11 16:17             ` Jason Gunthorpe
2024-06-11 16:54               ` Daniel Vetter
2024-06-06  1:58       ` David Ahern
2024-06-05  3:11 ` Jakub Kicinski
2024-06-05 12:06   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240604200556.398afd07@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=aron.silverton@oracle.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dsahern@kernel.org \
    --cc=hch@infradead.org \
    --cc=itayavr@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=lbloch@nvidia.com \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).