From: Jason Gunthorpe <jgg@nvidia.com>
To: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Danilo Krummrich <dakr@kernel.org>,
Alexandre Courbot <acourbot@nvidia.com>,
Dave Airlie <airlied@gmail.com>, Gary Guo <gary@garyguo.net>,
Joel Fernandes <joel@joelfernandes.org>,
Boqun Feng <boqun.feng@gmail.com>,
John Hubbard <jhubbard@nvidia.com>,
Ben Skeggs <bskeggs@nvidia.com>,
linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
paulmck@kernel.org
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
Date: Tue, 25 Feb 2025 18:57:56 -0400 [thread overview]
Message-ID: <20250225225756.GA4959@nvidia.com> (raw)
In-Reply-To: <20250225210228.GA1801922@joelnvbox>
On Tue, Feb 25, 2025 at 04:02:28PM -0500, Joel Fernandes wrote:
> > Besides that I don't really see why we can't just re-acquire it after we sleep?
> > Rust provides good options to implement it ergonimcally I think.
> >
> > >
> > > Another colleague told me RDMA also uses SRCU for a similar purpose as well.
> >
> > See the reasoning against SRCU from Sima [1], what's the reasoning of RDMA?
> >
> > [1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/
> For RDMA, I will ask Jason Gunthorpe to chime in, I CC'd him. Jason, correct
> me if I'm wrong about the RDMA user but this is what I recollect discussing
> with you.
In RDMA SRCU is not unbounded. It is limited to a system call
duration, and we don't have system calls that become time unbounded
inside drivers.
The motivation for RDMA was not really hotplug, but to support kernel
module upgrade. Some data center HA users were very interested in
this. To achieve it the driver module itself cannot have an elevated
module refcount. This requires swapping the module refcount for a
sleepable RW lock like SRCU or rwsem protecting all driver
callbacks. [1]
To be very clear, in RDMA you can open /dev/infiniband/uverbsX, run a
ioctl on the FD and then successfully rmmod the driver module while
the FD is open and while the ioctl is running. Any driver op will
complete, future ioctls will fail, and the module will complete.
So, from my perspective, this revocable idea would completely destroy
the actual production purpose we built the fast hot-plug machinery
for. It does not guarentee that driver threads are fenced prior to
completing remove. Intead it must rely on the FD itself to hold the
module refcount on the driver to keep the .text alive while driver
callbacks continue to be called. Making the module refcount linked to
userspace closing a FD renders the module unloadable in practice.
The common driver shutdown process in the kernel, that is well tested
and copied, makes the driver single threaded during the remove()
callback. Effectively instead of trying to micro-revoke individual
resources we revoke all concurrency threads and then it is completely
safe to destroy all the resources. This also guarentees that after
completing remove there is no Execute After Free risk to the driver
code.
SRCU/rwsem across all driver ops function pointer calls is part of
this scheme, but also things like cancling/flushing work queues,
blocking new work submission, preventing interrupts, removing syfs
files (they block concurrent threads internally), synchronizing any
possibly outstanding RCU callbacks, and more.
So, I'd suggest that if you have system calls that wait, the typical
expected solution would be to shoot down the waits during a remove
event so they can become time bounded.
> > > I have heard some concern around whether Rust is changing the driver model when
> > > it comes to driver detach / driver remove. Can you elaborate may be a bit about
> > > how Rust changes that mechanism versus C, when it comes to that?
> >
> > I think that one is simple, Rust does *not* change the driver model.
I think this resource-revoke idea is deviating from the normal
expected driver model by allowing driver code to continue to run in
other threads once remove completes. That is definitely abnormal at
least.
It is not necessarily *wrong*, but it sure is weird and as I explained
above it has bad system level properties.
Further, it seems to me there is a very unique DRM specific issue at
work "time unbounded driver callbacks". A weird solution to this
should not be baked into the common core kernel rust bindings and
break the working model of all other subsystems that don't have that
problem.
> > Similarly you can have custom functions for short sequences of I/O ops, or use
> > closures. I don't understand the concern.
>
> Yeah, this is certainly possible. I think one concern is similar to what you
> raised on the other thread you shared [1]:
> "Maybe we even want to replace it with SRCU entirely to ensure that drivers
> can't stall the RCU grace period for too long by accident."
I'd be worried about introducing a whole bunch more untestable failure
paths in drivers. Especially in areas like work queue submit that are
designed not to fail [2]. Non-failing work queues is a critical property
that I've relied on countless times. I'm not sure you even *can* recover
from this correctly in all cases.
Then in the other email did you say that even some memory allocations
go into this scheme? Yikes!
Further, hiding a synchronize_rcu in a devm destructor [3], once per
revocable object is awful. If you imagine having a rcu around each of
your revocable objects, how many synchronize_rcu()s is devm going to
call post-remove()?
On a busy server it is known to take a long time. So it is easy to
imagine driver remove times going into the many 10's of seconds for no
good reason. Maybe even multiple minutes if the driver ends up with
many of these objects.
[1] - Module .text is not unplugged from the kernel until all probed
drivers affiliated with that module have completed their remove
operations.
[2] - It is important that drivers shut down all their concurrency in
workqueues during remove because work queues do not hold the module
refcount. The only way .text lifecyle works is for drivers using work
queues is to rely on [1] to protect against Execute after Free.
[3] - Personally I agree with Laurent's points and I strongly dislike
devm. I'm really surprised to see Rust using it, I imagined Rust has
sufficiently strong object lifecycle management that it would not be
needed :(
Jason
next prev parent reply other threads:[~2025-02-25 22:58 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
2025-02-17 20:47 ` Sergio González Collado
2025-02-17 21:10 ` Daniel Almeida
2025-02-18 13:16 ` Alexandre Courbot
2025-02-18 20:51 ` Timur Tabi
2025-02-19 1:21 ` Alexandre Courbot
2025-02-19 3:24 ` John Hubbard
2025-02-19 12:51 ` Alexandre Courbot
2025-02-19 20:22 ` John Hubbard
2025-02-19 20:23 ` Dave Airlie
2025-02-19 23:13 ` Daniel Almeida
2025-02-20 0:14 ` John Hubbard
2025-02-21 11:35 ` Alexandre Courbot
2025-02-21 12:31 ` Danilo Krummrich
2025-02-19 20:11 ` Sergio González Collado
2025-02-18 10:07 ` Dirk Behme
2025-02-18 13:07 ` Alexandre Courbot
2025-02-20 6:23 ` Dirk Behme
2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
2025-02-17 21:15 ` Daniel Almeida
2025-02-17 14:04 ` [PATCH RFC 3/3] gpu: nova-core: add basic timer device Alexandre Courbot
2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
2025-02-18 8:07 ` Greg KH
2025-02-18 13:23 ` Alexandre Courbot
2025-02-17 21:33 ` Danilo Krummrich
2025-02-18 1:46 ` Dave Airlie
2025-02-18 10:26 ` Danilo Krummrich
2025-02-19 12:58 ` Simona Vetter
2025-02-24 1:40 ` Alexandre Courbot
2025-02-24 12:07 ` Danilo Krummrich
2025-02-24 12:11 ` Danilo Krummrich
2025-02-24 18:45 ` Joel Fernandes
2025-02-24 23:44 ` Danilo Krummrich
2025-02-25 15:52 ` Joel Fernandes
2025-02-25 16:09 ` Danilo Krummrich
2025-02-25 21:02 ` Joel Fernandes
2025-02-25 22:02 ` Danilo Krummrich
2025-02-25 22:42 ` Dave Airlie
2025-02-25 22:57 ` Jason Gunthorpe [this message]
2025-02-25 23:26 ` Danilo Krummrich
2025-02-25 23:45 ` Danilo Krummrich
2025-02-26 0:49 ` Jason Gunthorpe
2025-02-26 1:16 ` Danilo Krummrich
2025-02-26 17:21 ` Jason Gunthorpe
2025-02-26 21:31 ` Danilo Krummrich
2025-02-26 23:47 ` Jason Gunthorpe
2025-02-27 0:41 ` Boqun Feng
2025-02-27 14:46 ` Jason Gunthorpe
2025-02-27 15:18 ` Boqun Feng
2025-02-27 16:17 ` Jason Gunthorpe
2025-02-27 16:55 ` Boqun Feng
2025-02-27 17:32 ` Danilo Krummrich
2025-02-27 19:23 ` Jason Gunthorpe
2025-02-27 21:25 ` Boqun Feng
2025-02-27 22:00 ` Jason Gunthorpe
2025-02-27 22:40 ` Danilo Krummrich
2025-02-28 18:55 ` Jason Gunthorpe
2025-03-03 19:36 ` Danilo Krummrich
2025-03-03 21:50 ` Jason Gunthorpe
2025-03-04 9:57 ` Danilo Krummrich
2025-02-27 1:02 ` Greg KH
2025-02-27 1:34 ` John Hubbard
2025-02-27 21:42 ` Dave Airlie
2025-02-27 23:06 ` John Hubbard
2025-02-28 4:10 ` Dave Airlie
2025-02-28 18:50 ` Jason Gunthorpe
2025-02-28 10:52 ` Simona Vetter
2025-02-28 18:40 ` Jason Gunthorpe
2025-03-04 16:10 ` Simona Vetter
2025-03-04 16:42 ` Jason Gunthorpe
2025-03-05 7:30 ` Simona Vetter
2025-03-05 15:10 ` Jason Gunthorpe
2025-03-06 10:42 ` Simona Vetter
2025-03-06 15:32 ` Jason Gunthorpe
2025-03-07 10:28 ` Simona Vetter
2025-03-07 12:32 ` Jason Gunthorpe
2025-03-07 13:09 ` Simona Vetter
2025-03-07 14:55 ` Jason Gunthorpe
2025-03-13 14:32 ` Simona Vetter
2025-03-19 17:21 ` Jason Gunthorpe
2025-03-21 10:35 ` Simona Vetter
2025-03-21 12:04 ` Jason Gunthorpe
2025-03-21 12:12 ` Danilo Krummrich
2025-03-21 17:49 ` Jason Gunthorpe
2025-03-21 18:54 ` Danilo Krummrich
2025-03-07 14:00 ` Greg KH
2025-03-07 14:46 ` Jason Gunthorpe
2025-03-07 15:19 ` Greg KH
2025-03-07 15:25 ` Jason Gunthorpe
2025-02-27 14:23 ` Jason Gunthorpe
2025-02-27 11:32 ` Danilo Krummrich
2025-02-27 15:07 ` Jason Gunthorpe
2025-02-27 16:51 ` Danilo Krummrich
2025-02-25 14:11 ` Alexandre Courbot
2025-02-25 15:06 ` Danilo Krummrich
2025-02-25 15:23 ` Alexandre Courbot
2025-02-25 15:53 ` Danilo Krummrich
2025-02-27 21:37 ` Dave Airlie
2025-02-28 1:49 ` Timur Tabi
2025-02-28 2:24 ` Dave Airlie
2025-02-18 13:35 ` Alexandre Courbot
2025-02-18 1:42 ` Dave Airlie
2025-02-18 13:47 ` Alexandre Courbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250225225756.GA4959@nvidia.com \
--to=jgg@nvidia.com \
--cc=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=boqun.feng@gmail.com \
--cc=bskeggs@nvidia.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=joel@joelfernandes.org \
--cc=joelagnelf@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=paulmck@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).