Linux Documentation
 help / color / mirror / Atom feed
From: Tzung-Bi Shih <tzungbi@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Benson Leung <bleung@chromium.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Danilo Krummrich <dakr@kernel.org>,
	Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	chrome-platform@lists.linux.dev, linux-kselftest@vger.kernel.org,
	Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
	Bartosz Golaszewski <brgl@bgdev.pl>,
	Wolfram Sang <wsa+renesas@sang-engineering.com>,
	Simona Vetter <simona.vetter@ffwll.ch>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH v5 5/7] revocable: Add fops replacement
Date: Sat, 18 Oct 2025 00:07:58 +0800	[thread overview]
Message-ID: <aPJp3hP44n96Rug9@tzungbi-laptop> (raw)
In-Reply-To: <20251017134916.GK3901471@nvidia.com>

On Fri, Oct 17, 2025 at 10:49:16AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 17, 2025 at 02:36:58AM +0000, Tzung-Bi Shih wrote:
> > Imagining the following example:
> > 
> > /* res1 and res2 are provided by hot-pluggable devices. */
> > struct filp_priv {
> >     void *res1;
> >     void *res2;
> > };
> > 
> > /* In .open() fops */
> > priv = kzalloc(sizeof(struct filp_priv), ...);
> > priv->res1 = ...;
> > priv->res2 = ...;
> > filp->private_data = priv;
> > 
> > /* In .read() fops */
> > priv = filp->private_data;
> > priv->res1    // could result UAF if the device has gone
> > priv->res2    // could result UAF if the device has gone
> > 
> > 
> > How does the bool * work for the example?
> 
> You are thinking about it completely wrong, you are trying to keep the
> driver running conccurrently after it's remove returns - but that
> isn't how Linux drivers are designed.
> 
> We have a whole family of synchronous fencing APIs that drivers call
> in their remove() callback to shut down their concurrency. Think of
> things like free_irq(), cancel_work_sync(), timer_shutdown_sync(),
> sysfs_remove_files(). All of these guarentee the concurrent callbacks
> are fenced before returning.
> 
> The only issue with cros_ec is this:
> 
> static void cros_ec_chardev_remove(struct platform_device *pdev)
> {
>         struct miscdevice *misc = dev_get_drvdata(&pdev->dev);
> 
>         misc_deregister(misc);
> }
> 
> It doesn't fence the cdevs! Misc is a hard API to use because it
> doesn't have a misc_deregister_sync() variation!
> 
> Dan/Laurent's point and proposal was that mis_deregister() does not
> work like this! It is an anomaly that driver authors typically over
> look.
> 
> So the proposal was to add some way to get a:
>   misc_deregister_sync()
> 
> What gives the fence. Under your proposal it would lock the SRCU and
> change the bool. After it returns no cdev related threads are running
> in fops touching res1/res2. I think your proposal to replace the fops
> and that related machinery is smart and has a chance to succeed.
> 
> From this perspective your example is malformed. Resources should not
> become revoked concurrently *while a driver is bound*. The driver
> should be unbound, call misc_deregister_sync()/etc, and return from
> remove() guaranteeing it no longer touches any resources.

Imagining:
- Driver X provides the res1.
- Driver Y provides the res2.
- Driver Z provides the chardev /dev/zzz.  The file operations use res1
  and res2.
- A userspace program opened /dev/zzz.

In the approach, what is .remove() of driver X supposed to do when driver X
is unbinding (e.g. due to device unplug)?

If it ends up call misc_deregister_sync(), should the synchronous function
wait for the userspace program to close the FD?


The design behind revocable: the driver X waits via synchronize_srcu(), and
then it is free to go.  The subsequent accesses to res1 will get NULL, and
should fail gracefully.

> For this specific cros_ec driver it's "res" is this:
> 
>         struct cros_ec_dev *ec = dev_get_drvdata(pdev->dev.parent);
>         struct cros_ec_platform *ec_platform = dev_get_platdata(ec->dev);

In fact, no, the "res" we are concerning is struct cros_ec_device, e.g. [1].
(I knew the naming cros_ec_device vs. cros_ec_dev is somehow easy to confuse.)

[1] https://elixir.bootlin.com/linux/v6.17/source/drivers/platform/chrome/cros_ec_spi.c#L752

> This is already properly lifetime controlled!
> 
> It *HAS* to be, and even your patches are assuming it by blindly
> reaching into the parent's memory!
> 
> +	misc->rps[0] = ec->ec_dev->revocable_provider;
> 
> If the parent driver has been racily unbound at this point the
> ec->ec_dev is already a UAF!

Not really, it uses the fact that the caller is from probe().  I think the
driver can't be unbound when it is still in probe().
(Probe protocol device -> register the MFD device ->
 add cros-ec-chardev device and probe.)

> For cros it is safe because the cros_ec driver is a child of a MFD and
> the MFD logic ensures that the children are unbound as part of
> destroying the parent. So 'ec' is guarenteed valid from probe() to
> remove() return.
> 
> IHMO auto-revoke is a terrible idea, if you go down that path then why
> is misc special? You need to auto-revoke irqs, timers, work queues,
> etc too? That's a mess.

To be clear, I'm using misc as an example which is also the real crash we
observed.  If the file operations use other resources provided by a
hot-pluggable device, it'd need to use revocable APIs to prevent the UAFs.

  reply	other threads:[~2025-10-17 16:08 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16  5:41 [PATCH v5 0/7] platform/chrome: Fix a possible UAF via revocable Tzung-Bi Shih
2025-10-16  5:41 ` [PATCH v5 1/7] revocable: Revocable resource management Tzung-Bi Shih
2025-10-16  5:41 ` [PATCH v5 2/7] revocable: Add Kunit test cases Tzung-Bi Shih
2025-10-16  5:42 ` [PATCH v5 3/7] selftests: revocable: Add kselftest cases Tzung-Bi Shih
2025-10-16  5:42 ` [PATCH v5 4/7] platform/chrome: Protect cros_ec_device lifecycle with revocable Tzung-Bi Shih
2025-10-16  5:42 ` [PATCH v5 5/7] revocable: Add fops replacement Tzung-Bi Shih
2025-10-16 12:31   ` Jason Gunthorpe
2025-10-17  2:36     ` Tzung-Bi Shih
2025-10-17 13:49       ` Jason Gunthorpe
2025-10-17 16:07         ` Tzung-Bi Shih [this message]
2025-10-17 16:21           ` Jason Gunthorpe
2025-10-19 15:08             ` Tzung-Bi Shih
2025-10-20 11:57               ` Jason Gunthorpe
2025-10-21  4:49                 ` Tzung-Bi Shih
2025-10-21 12:15                   ` Jason Gunthorpe
2025-10-23 14:22                     ` Tzung-Bi Shih
2025-10-23 14:51                       ` Jason Gunthorpe
2025-10-23 15:04                         ` Greg Kroah-Hartman
2025-10-23 15:57                           ` Jason Gunthorpe
2025-10-23 16:20                             ` Danilo Krummrich
2025-10-23 16:48                               ` Jason Gunthorpe
2025-10-23 18:30                                 ` Danilo Krummrich
2025-12-11  3:23                           ` Laurent Pinchart
2025-12-11  3:47                             ` Wolfram Sang
2025-12-11  8:05                               ` Laurent Pinchart
2025-12-11  8:36                                 ` Wolfram Sang
2025-12-11 13:43                                   ` Laurent Pinchart
2025-12-11 14:46                                     ` Tzung-Bi Shih
2025-12-12  8:32                                       ` Tzung-Bi Shih
2025-11-07  4:11                     ` Tzung-Bi Shih
2025-11-07 14:12                       ` Jason Gunthorpe
2025-10-17 16:29           ` Danilo Krummrich
2025-10-17 16:37             ` Jason Gunthorpe
2025-10-17 18:19               ` Danilo Krummrich
2025-10-17 18:44                 ` Jason Gunthorpe
2025-10-17 21:41                   ` Danilo Krummrich
2025-10-17 22:56                     ` Jason Gunthorpe
2025-10-23 15:32                       ` Danilo Krummrich
2025-10-16 18:38   ` Randy Dunlap
2025-10-17  2:41     ` Tzung-Bi Shih
2025-10-16  5:42 ` [PATCH v5 6/7] char: misc: Leverage revocable " Tzung-Bi Shih
2025-10-16  5:42 ` [PATCH v5 7/7] platform/chrome: cros_ec_chardev: Secure cros_ec_device via revocable Tzung-Bi Shih

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPJp3hP44n96Rug9@tzungbi-laptop \
    --to=tzungbi@kernel.org \
    --cc=bleung@chromium.org \
    --cc=brgl@bgdev.pl \
    --cc=chrome-platform@lists.linux.dev \
    --cc=corbet@lwn.net \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jgg@nvidia.com \
    --cc=laurent.pinchart@ideasonboard.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=shuah@kernel.org \
    --cc=simona.vetter@ffwll.ch \
    --cc=wsa+renesas@sang-engineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox