public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: John Hubbard <jhubbard@nvidia.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	Danilo Krummrich <dakr@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Alexandre Courbot <acourbot@nvidia.com>,
	Dave Airlie <airlied@gmail.com>, Gary Guo <gary@garyguo.net>,
	Joel Fernandes <joel@joelfernandes.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Ben Skeggs <bskeggs@nvidia.com>,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	paulmck@kernel.org
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
Date: Tue, 4 Mar 2025 12:42:01 -0400	[thread overview]
Message-ID: <20250304164201.GN133783@nvidia.com> (raw)
In-Reply-To: <Z8cmBWB8rl97-zSG@phenom.ffwll.local>

On Tue, Mar 04, 2025 at 05:10:45PM +0100, Simona Vetter wrote:
> On Fri, Feb 28, 2025 at 02:40:13PM -0400, Jason Gunthorpe wrote:
> > On Fri, Feb 28, 2025 at 11:52:57AM +0100, Simona Vetter wrote:
> > 
> > > - Nuke the driver binding manually through sysfs with the unbind files.
> > > - Nuke all userspace that might beholding files and other resources open.
> > > - At this point the module refcount should be zero and you can unload it.
> > > 
> > > Except developers really don't like the manual unbind step, and so we're
> > > missing try_module_get() in a bunch of places where it really should be.
> > 
> > IMHO they are not missing, we just have a general rule that if a
> > cleanup function, required to be called prior to module exit, revokes
> > any .text pointers then you don't need to hold the module refcount.
> > 
> > file_operations doesn't have such a cleanup function which is why it
> > takes the refcount.
> > 
> > hrtimer does have such a function which is why it doesn't take the
> > refcount.
> 
> I was talking about a bunch of other places, where it works like
> file_operations, except we don't bother with the module reference count.
> I've seen patches fly by where people "fix" these things because module
> unload is "broken".

Sure, but there are only two correct API approaches, either you
require the user to make a cancel call that sanitizes the module
references, or you manage them internally.

Hope and pray isn't an option :)

> gpu drivers can hog console_lock (yes we're trying to get away from that
> as much as possible), at that point a cavalier attitude of "you can just
> wait" isn't very appreciated.

What are you trying to solve here? If the system is already stuck
infinitely on the console lock why is module remove even being
considered?

module remove shouldn't be a remedy for a crashed driver...

> > But so is half removing the driver while it is doing *anything* and
> > trying to mitigate that with a different kind of hard to do locking
> > fix. *shrug*
> 
> The thing is that rust helps you enormously with implementing revocable
> resources and making sure you're not cheating with all the bail-out paths.

Assuming a half alive driver with MMIO and interrupts ripped away
doesn't lock up.

Assuming all your interrupt triggered sleeps have gained a shootdown
mechanism.

Assuming all the new extra error paths this creates don't corrupt the
internal state of the driver and cause it to lockup.

Meh. It doesn't seem like such an obvious win to me. Personally I'm
terrified of the idea of a zombie driver half sitting around in a
totally untestable configuration working properly..

> It cannot help you with making sure you have interruptible/abortable
> sleeps in all the right places. 

:(

> > Like, I see a THIS_MODULE in driver->fops == amdgpu_driver_kms_fops ?
> 
> Yeah it's there, except only for the userspace references and not for the
> kernel internal ones. Because developers get a bit prickle about adding
> those unfortunately due to "it breaks module unload". Maybe we just should
> add them, at least for rust.

Yeah, I think such obviously wrong things should be pushed back
against. We don't want EAF bugs in the kernel, we want security...

> You've missed the "it will upset developers part". I've seen people remove
> module references that are needed, to "fix" driver unloading.

When done properly the module can be unloaded. Most rdma driver
modules are unloadable, live, while FDs are open.

> The third part is that I'm not aware of anything in rust that would
> guarantee that the function pointer and the module reference actually
> belong to each another. Which means another runtime check most likely, and
> hence another thing that shouldn't fail which kinda can now.

I suspect it has to come from the C code API contracts, which leak
into the binding design.

If the C API handles module refcounting internally then rust is fine
so long as it enforces THIS_MODULE.

If the C API requires cancel then rust is fine so long as the binding
guarantees cancel before module unload.

Jason

  reply	other threads:[~2025-03-04 16:42 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
2025-02-17 20:47   ` Sergio González Collado
2025-02-17 21:10   ` Daniel Almeida
2025-02-18 13:16     ` Alexandre Courbot
2025-02-18 20:51       ` Timur Tabi
2025-02-19  1:21         ` Alexandre Courbot
2025-02-19  3:24           ` John Hubbard
2025-02-19 12:51             ` Alexandre Courbot
2025-02-19 20:22               ` John Hubbard
2025-02-19 20:23                 ` Dave Airlie
2025-02-19 23:13                   ` Daniel Almeida
2025-02-20  0:14                     ` John Hubbard
2025-02-21 11:35                       ` Alexandre Courbot
2025-02-21 12:31                         ` Danilo Krummrich
2025-02-19 20:11           ` Sergio González Collado
2025-02-18 10:07   ` Dirk Behme
2025-02-18 13:07     ` Alexandre Courbot
2025-02-20  6:23       ` Dirk Behme
2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
2025-02-17 21:15   ` Daniel Almeida
2025-02-17 14:04 ` [PATCH RFC 3/3] gpu: nova-core: add basic timer device Alexandre Courbot
2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
2025-02-18  8:07   ` Greg KH
2025-02-18 13:23     ` Alexandre Courbot
2025-02-17 21:33 ` Danilo Krummrich
2025-02-18  1:46   ` Dave Airlie
2025-02-18 10:26     ` Danilo Krummrich
2025-02-19 12:58       ` Simona Vetter
2025-02-24  1:40     ` Alexandre Courbot
2025-02-24 12:07       ` Danilo Krummrich
2025-02-24 12:11         ` Danilo Krummrich
2025-02-24 18:45           ` Joel Fernandes
2025-02-24 23:44             ` Danilo Krummrich
2025-02-25 15:52               ` Joel Fernandes
2025-02-25 16:09                 ` Danilo Krummrich
2025-02-25 21:02                   ` Joel Fernandes
2025-02-25 22:02                     ` Danilo Krummrich
2025-02-25 22:42                       ` Dave Airlie
2025-02-25 22:57                     ` Jason Gunthorpe
2025-02-25 23:26                       ` Danilo Krummrich
2025-02-25 23:45                       ` Danilo Krummrich
2025-02-26  0:49                         ` Jason Gunthorpe
2025-02-26  1:16                           ` Danilo Krummrich
2025-02-26 17:21                             ` Jason Gunthorpe
2025-02-26 21:31                               ` Danilo Krummrich
2025-02-26 23:47                                 ` Jason Gunthorpe
2025-02-27  0:41                                   ` Boqun Feng
2025-02-27 14:46                                     ` Jason Gunthorpe
2025-02-27 15:18                                       ` Boqun Feng
2025-02-27 16:17                                         ` Jason Gunthorpe
2025-02-27 16:55                                           ` Boqun Feng
2025-02-27 17:32                                             ` Danilo Krummrich
2025-02-27 19:23                                               ` Jason Gunthorpe
2025-02-27 21:25                                                 ` Boqun Feng
2025-02-27 22:00                                                   ` Jason Gunthorpe
2025-02-27 22:40                                                     ` Danilo Krummrich
2025-02-28 18:55                                                       ` Jason Gunthorpe
2025-03-03 19:36                                                         ` Danilo Krummrich
2025-03-03 21:50                                                           ` Jason Gunthorpe
2025-03-04  9:57                                                             ` Danilo Krummrich
2025-02-27  1:02                                   ` Greg KH
2025-02-27  1:34                                     ` John Hubbard
2025-02-27 21:42                                       ` Dave Airlie
2025-02-27 23:06                                         ` John Hubbard
2025-02-28  4:10                                           ` Dave Airlie
2025-02-28 18:50                                             ` Jason Gunthorpe
2025-02-28 10:52                                       ` Simona Vetter
2025-02-28 18:40                                         ` Jason Gunthorpe
2025-03-04 16:10                                           ` Simona Vetter
2025-03-04 16:42                                             ` Jason Gunthorpe [this message]
2025-03-05  7:30                                               ` Simona Vetter
2025-03-05 15:10                                                 ` Jason Gunthorpe
2025-03-06 10:42                                                   ` Simona Vetter
2025-03-06 15:32                                                     ` Jason Gunthorpe
2025-03-07 10:28                                                       ` Simona Vetter
2025-03-07 12:32                                                         ` Jason Gunthorpe
2025-03-07 13:09                                                           ` Simona Vetter
2025-03-07 14:55                                                             ` Jason Gunthorpe
2025-03-13 14:32                                                               ` Simona Vetter
2025-03-19 17:21                                                                 ` Jason Gunthorpe
2025-03-21 10:35                                                                   ` Simona Vetter
2025-03-21 12:04                                                                     ` Jason Gunthorpe
2025-03-21 12:12                                                                       ` Danilo Krummrich
2025-03-21 17:49                                                                         ` Jason Gunthorpe
2025-03-21 18:54                                                                           ` Danilo Krummrich
2025-03-07 14:00                                                           ` Greg KH
2025-03-07 14:46                                                             ` Jason Gunthorpe
2025-03-07 15:19                                                               ` Greg KH
2025-03-07 15:25                                                                 ` Jason Gunthorpe
2025-02-27 14:23                                     ` Jason Gunthorpe
2025-02-27 11:32                                   ` Danilo Krummrich
2025-02-27 15:07                                     ` Jason Gunthorpe
2025-02-27 16:51                                       ` Danilo Krummrich
2025-02-25 14:11         ` Alexandre Courbot
2025-02-25 15:06           ` Danilo Krummrich
2025-02-25 15:23             ` Alexandre Courbot
2025-02-25 15:53               ` Danilo Krummrich
2025-02-27 21:37           ` Dave Airlie
2025-02-28  1:49             ` Timur Tabi
2025-02-28  2:24               ` Dave Airlie
2025-02-18 13:35   ` Alexandre Courbot
2025-02-18  1:42 ` Dave Airlie
2025-02-18 13:47   ` Alexandre Courbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250304164201.GN133783@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=bskeggs@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=jhubbard@nvidia.com \
    --cc=joel@joelfernandes.org \
    --cc=joelagnelf@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=paulmck@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox