rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: John Hubbard <jhubbard@nvidia.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	Danilo Krummrich <dakr@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Alexandre Courbot <acourbot@nvidia.com>,
	Dave Airlie <airlied@gmail.com>, Gary Guo <gary@garyguo.net>,
	Joel Fernandes <joel@joelfernandes.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Ben Skeggs <bskeggs@nvidia.com>,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	paulmck@kernel.org
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
Date: Wed, 5 Mar 2025 11:10:12 -0400	[thread overview]
Message-ID: <20250305151012.GW133783@nvidia.com> (raw)
In-Reply-To: <Z8f9mgD4LUJN_dWw@phenom.ffwll.local>

On Wed, Mar 05, 2025 at 08:30:34AM +0100, Simona Vetter wrote:
> - developers who want to quickly test new driver versions without full
>   reboot. They're often preferring convenience over correctness, like with
>   the removal of module refcounting that's strictly needed but means they
>   first have to unbind drivers in sysfs before they can unload the driver.
> 
>   Another one is that this use-case prefers that the hw is cleanly shut
>   down, so that you can actually load the new driver from a well-known
>   state. And it's entirely ok if this all fails occasionally, it's just
>   for development and testing.

I've never catered to this because if you do this one:

> - hotunplug as an actual use-case. Bugs are not ok. The hw can go away at
>   any moment. And it might happen while you're holding console_lock. You
>   generally do not remove the actual module here, which is why for the
>   actual production use-case getting that part right isn't really
>   required. But getting the lifetimes of all the various
>   structs/objects/resources perfectly right is required.

Fully and properly then developers are happy too..

And we were always able to do this one..

> So the "stuck on console_lock" is the 2nd case, not the first. Module
> unload doesn't even come into play on that one.

I don't see reliable hot unplug if the driver can get stuck on a
lock :|

> > Assuming all your interrupt triggered sleeps have gained a shootdown
> > mechanism.
> 
> Hence why I want revocable to only be rcu, not srcu.

Sorry, I was not clear. You also have to make the PCI interrupt(s)
revocable. Just like the MMIO it cannot leak past the remove() as a
matter of driver-model correctness.

So, you end up disabling the interrupt while the driver is still
running and any sleeps in the driver that are waiting for an interrupt
still need to be shot down.

Further, I just remembered, (Danilo please notice!) there is another
related issue here that DMA mappings *may not* outlive remove()
either. netdev had a bug related to this recently and it was all
agreed that it is not allowed. The kernel can crash in a couple of
different ways if you try to do this.

https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/#m0c7dda0fb5981240879c5ca489176987d688844c

 > a device with no driver bound should not be passed to the DMA API,
 > much less a dead device that's already been removed from its parent
 > bus.

So now we have a driver design that must have:
 1) Revocable MMIO
 2) Revocable Interrupts
 3) Revocable DMA mappings
 4) Revocable HW DMA - the HW MUST stop doing DMA before the DMA API
    is shut down. Failure is a correctness/UAF/security issue

Somehow the driver has to implement this, not get confused or lock up,
all while Rust doesn't help you guarentee much of any of the important
properties related to #2/#3/#4. And worse all this manual recvocable
stuff is special and unique to hot-unplug. So it will all be untested
and broken.

Looks really hard to me. *Especially* the wild DMA thing.

This has clearly been missed here as with the current suggestion to
just revoke MMIO means the driver can't actually go out and shutdown
it's HW DMA after-the-fact since the MMIO is gone. Thus you are pretty
much guaranteed to fail #4, by design, which is a serious issue.

I'm sorry it has taken so many emails to reach this, I did know it,
but didn't put the pieces coherently together till just now :\

Compare that to how RDMA works, where we do a DMA shutdown by
destroying all the objects just the same as if the user closed a
FD. The normal destruction paths fence the HW DMA and we end up in
remove with cleanly shutdown HW and no DMA API open. The core code
manages all of this. Simple, correct, no buggy hotplug only paths.

> Yeah agreed. I might really badly regret this all. But I'm not sold that
> switching to message passing design is really going to be better, while
> it's definitely going to be a huge amount of work.

Yeah, I'd think from where DRM is now continuing trying to address the
sleeps is more tractable and achievable than a message passing
redesign..

> > If the C API handles module refcounting internally then rust is fine
> > so long as it enforces THIS_MODULE.
> 
> You could do contrived stuff and pass function pointers around, so that
> THIS_MODULE doesn't actually match up with the function pointer.

Ah.. I guess rust would have to validate the function pointers and the
THIS_MODULE are consistent at runtime time before handing them off to
C to prevent this. Seems like a reasonable thing to put under some
CONFIG_DEBUG, also seems a bit hard to implement perhaps..

> > If the C API requires cancel then rust is fine so long as the binding
> > guarantees cancel before module unload.
> 
> Yeah this is again where I think rust needs a bit more, because the
> compiler can't always nicely proof this for you in all the "obvious"
> cases.

But in the discussion about the hrtimer it was asserted that Rust can :)

I believe it could be, so long as rust bindings are pretty restricted
and everything rolls up and cancels when things are destroyed. Nothing
should be able to leak out as a principle of the all the binding
designs.

Seems like a hard design to enforce across all bindings, eg workqeue
is already outside of it. Seems like something that should be written
down in a binding design document..

Jason

  reply	other threads:[~2025-03-05 15:10 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
2025-02-17 20:47   ` Sergio González Collado
2025-02-17 21:10   ` Daniel Almeida
2025-02-18 13:16     ` Alexandre Courbot
2025-02-18 20:51       ` Timur Tabi
2025-02-19  1:21         ` Alexandre Courbot
2025-02-19  3:24           ` John Hubbard
2025-02-19 12:51             ` Alexandre Courbot
2025-02-19 20:22               ` John Hubbard
2025-02-19 20:23                 ` Dave Airlie
2025-02-19 23:13                   ` Daniel Almeida
2025-02-20  0:14                     ` John Hubbard
2025-02-21 11:35                       ` Alexandre Courbot
2025-02-21 12:31                         ` Danilo Krummrich
2025-02-19 20:11           ` Sergio González Collado
2025-02-18 10:07   ` Dirk Behme
2025-02-18 13:07     ` Alexandre Courbot
2025-02-20  6:23       ` Dirk Behme
2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
2025-02-17 21:15   ` Daniel Almeida
2025-02-17 14:04 ` [PATCH RFC 3/3] gpu: nova-core: add basic timer device Alexandre Courbot
2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
2025-02-18  8:07   ` Greg KH
2025-02-18 13:23     ` Alexandre Courbot
2025-02-17 21:33 ` Danilo Krummrich
2025-02-18  1:46   ` Dave Airlie
2025-02-18 10:26     ` Danilo Krummrich
2025-02-19 12:58       ` Simona Vetter
2025-02-24  1:40     ` Alexandre Courbot
2025-02-24 12:07       ` Danilo Krummrich
2025-02-24 12:11         ` Danilo Krummrich
2025-02-24 18:45           ` Joel Fernandes
2025-02-24 23:44             ` Danilo Krummrich
2025-02-25 15:52               ` Joel Fernandes
2025-02-25 16:09                 ` Danilo Krummrich
2025-02-25 21:02                   ` Joel Fernandes
2025-02-25 22:02                     ` Danilo Krummrich
2025-02-25 22:42                       ` Dave Airlie
2025-02-25 22:57                     ` Jason Gunthorpe
2025-02-25 23:26                       ` Danilo Krummrich
2025-02-25 23:45                       ` Danilo Krummrich
2025-02-26  0:49                         ` Jason Gunthorpe
2025-02-26  1:16                           ` Danilo Krummrich
2025-02-26 17:21                             ` Jason Gunthorpe
2025-02-26 21:31                               ` Danilo Krummrich
2025-02-26 23:47                                 ` Jason Gunthorpe
2025-02-27  0:41                                   ` Boqun Feng
2025-02-27 14:46                                     ` Jason Gunthorpe
2025-02-27 15:18                                       ` Boqun Feng
2025-02-27 16:17                                         ` Jason Gunthorpe
2025-02-27 16:55                                           ` Boqun Feng
2025-02-27 17:32                                             ` Danilo Krummrich
2025-02-27 19:23                                               ` Jason Gunthorpe
2025-02-27 21:25                                                 ` Boqun Feng
2025-02-27 22:00                                                   ` Jason Gunthorpe
2025-02-27 22:40                                                     ` Danilo Krummrich
2025-02-28 18:55                                                       ` Jason Gunthorpe
2025-03-03 19:36                                                         ` Danilo Krummrich
2025-03-03 21:50                                                           ` Jason Gunthorpe
2025-03-04  9:57                                                             ` Danilo Krummrich
2025-02-27  1:02                                   ` Greg KH
2025-02-27  1:34                                     ` John Hubbard
2025-02-27 21:42                                       ` Dave Airlie
2025-02-27 23:06                                         ` John Hubbard
2025-02-28  4:10                                           ` Dave Airlie
2025-02-28 18:50                                             ` Jason Gunthorpe
2025-02-28 10:52                                       ` Simona Vetter
2025-02-28 18:40                                         ` Jason Gunthorpe
2025-03-04 16:10                                           ` Simona Vetter
2025-03-04 16:42                                             ` Jason Gunthorpe
2025-03-05  7:30                                               ` Simona Vetter
2025-03-05 15:10                                                 ` Jason Gunthorpe [this message]
2025-03-06 10:42                                                   ` Simona Vetter
2025-03-06 15:32                                                     ` Jason Gunthorpe
2025-03-07 10:28                                                       ` Simona Vetter
2025-03-07 12:32                                                         ` Jason Gunthorpe
2025-03-07 13:09                                                           ` Simona Vetter
2025-03-07 14:55                                                             ` Jason Gunthorpe
2025-03-13 14:32                                                               ` Simona Vetter
2025-03-19 17:21                                                                 ` Jason Gunthorpe
2025-03-21 10:35                                                                   ` Simona Vetter
2025-03-21 12:04                                                                     ` Jason Gunthorpe
2025-03-21 12:12                                                                       ` Danilo Krummrich
2025-03-21 17:49                                                                         ` Jason Gunthorpe
2025-03-21 18:54                                                                           ` Danilo Krummrich
2025-03-07 14:00                                                           ` Greg KH
2025-03-07 14:46                                                             ` Jason Gunthorpe
2025-03-07 15:19                                                               ` Greg KH
2025-03-07 15:25                                                                 ` Jason Gunthorpe
2025-02-27 14:23                                     ` Jason Gunthorpe
2025-02-27 11:32                                   ` Danilo Krummrich
2025-02-27 15:07                                     ` Jason Gunthorpe
2025-02-27 16:51                                       ` Danilo Krummrich
2025-02-25 14:11         ` Alexandre Courbot
2025-02-25 15:06           ` Danilo Krummrich
2025-02-25 15:23             ` Alexandre Courbot
2025-02-25 15:53               ` Danilo Krummrich
2025-02-27 21:37           ` Dave Airlie
2025-02-28  1:49             ` Timur Tabi
2025-02-28  2:24               ` Dave Airlie
2025-02-18 13:35   ` Alexandre Courbot
2025-02-18  1:42 ` Dave Airlie
2025-02-18 13:47   ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250305151012.GW133783@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=bskeggs@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=jhubbard@nvidia.com \
    --cc=joel@joelfernandes.org \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=paulmck@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).