From: Danilo Krummrich <dakr@kernel.org>
To: Alexandre Courbot <acourbot@nvidia.com>
Cc: Dave Airlie <airlied@gmail.com>, Gary Guo <gary@garyguo.net>,
Joel Fernandes <joel@joelfernandes.org>,
Boqun Feng <boqun.feng@gmail.com>,
John Hubbard <jhubbard@nvidia.com>,
Ben Skeggs <bskeggs@nvidia.com>,
linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
Nouveau <nouveau-bounces@lists.freedesktop.org>
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
Date: Tue, 25 Feb 2025 16:53:04 +0100 [thread overview]
Message-ID: <Z73nYKsq14Pf6ucp@cassiopeiae> (raw)
In-Reply-To: <D81MP8Y5ME66.3SLPVNXERH1HU@nvidia.com>
On Wed, Feb 26, 2025 at 12:23:40AM +0900, Alexandre Courbot wrote:
> On Wed Feb 26, 2025 at 12:06 AM JST, Danilo Krummrich wrote:
> > On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:
> >> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> >> > CC: Gary
> >> >
> >> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >> >> This inability to sleep while we are accessing registers seems very
> >> >> constraining to me, if not dangerous. It is pretty common to have
> >> >> functions intermingle hardware accesses with other operations that might
> >> >> sleep, and this constraint means that in such cases the caller would
> >> >> need to perform guard lifetime management manually:
> >> >>
> >> >> let bar_guard = bar.try_access()?;
> >> >> /* do something non-sleeping with bar_guard */
> >> >> drop(bar_guard);
> >> >>
> >> >> /* do something that might sleep */
> >> >>
> >> >> let bar_guard = bar.try_access()?;
> >> >> /* do something non-sleeping with bar_guard */
> >> >> drop(bar_guard);
> >> >>
> >> >> ...
> >> >>
> >> >> Failure to drop the guard potentially introduces a race condition, which
> >> >> will receive no compile-time warning and potentialy not even a runtime
> >> >> one unless lockdep is enabled. This problem does not exist with the
> >> >> equivalent C code AFAICT, which makes the Rust version actually more
> >> >> error-prone and dangerous, the opposite of what we are trying to achieve
> >> >> with Rust. Or am I missing something?
> >> >
> >> > Generally you are right, but you have to see it from a different perspective.
> >> >
> >> > What you describe is not an issue that comes from the design of the API, but is
> >> > a limitation of Rust in the kernel. People are aware of the issue and with klint
> >> > [1] there are solutions for that in the pipeline, see also [2] and [3].
> >> >
> >> > [1] https://rust-for-linux.com/klint
> >> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> >> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
> >>
> >> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
> >> its own admission. But even if the ignore the safety issue, the other one
> >> (ergonomics) is still there.
> >>
> >> Basically this way of accessing registers imposes quite a mental burden on its
> >> users. It requires a very different (and harsher) discipline than when writing
> >> the same code in C
> >
> > We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and
> > drm_dev_unplug().
>
> Granted, but the use of these is much more coarsed-grained than what is
> expected of IO resources, right?
Potentially, yes. But exactly this characteristic has been criticised [1].
[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/
>
> >
> >> and the correct granularity to use is unclear to me.
> >>
> >> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
> >> particular register in a busy loop, should I call try_access() once before the
> >> loop? Or every time before accessing the register?
> >
> > I think we should re-acquire the guard in each iteration and drop it before the
> > delay. I think a simple closure would work very well for this pattern?
> >
> >> I'm afraid having to check
> >> that the resource is still alive before accessing any register is going to
> >> become tedious very quickly.
> >>
> >> I understand that we want to protect against accessing the IO region of an
> >> unplugged device ; but still there is no guarantee that the device won't be
> >> unplugged in the middle of a critical section, however short. Thus the driver
> >> code should be able to recognize that the device has fallen off the bus when it
> >> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
> >> extra protection that AFAICT isn't used in C?
> >
> > As mentioned above, we already do similar things in C.
> >
> > Also, think about what's the alternative. If we remove the Devres wrapper of
> > pci::Bar, we lose the control over the lifetime of the bar mapping and it can
> > easily out-live the device / driver binding. This makes the API unsound.
>
> Oh my issue is not with the Devres wrapper, I think it makes sense -
> it's more the use of RCU to control access to the resource that I find
> too constraining. And I'm pretty sure there will be more users of the
> same opinion as more drivers using it get written.
What do you suggest?
>
> >
> > With this drivers would be able to keep resources acquired. What if after a
> > hotplug the physical address region is re-used and to be mapped by another
> > driver?
>
> Actually - wouldn't that issue also be addressed by a PCI equivalent to
> drm_dev_enter() and friends that ensures the device (and thus its
> devres resources) stay in place?
I'm not sure I get the idea, but we can *not* have the device resources stay in
place once the device is unbound (e.g. keep the resource region acquired by the
driver).
Consequently, we have to have a way to revoke access to the corresponding
pci::Bar.
>
> Using Rust, I can imagine (but not picture precisely yet) some method of
> the device that returns a reference to an inner structure containing its
> resources, available with immediate access. Since it would be
> coarser-grained, it could rely on something less constraining than RCU
> without a noticeable performance penalty.
We had similar attempts when we designed this API, i.e. a common Revocable in
the driver private data of a device. But it had some chicken-egg issues on
initialization in probe(). Besides that, it wouldn't get you rid of the
Revocable, since the corresponding resource are only valid while the driver is
bound to a device, not for the entire lifetime of the device.
next prev parent reply other threads:[~2025-02-25 15:53 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
2025-02-17 20:47 ` Sergio González Collado
2025-02-17 21:10 ` Daniel Almeida
2025-02-18 13:16 ` Alexandre Courbot
2025-02-18 20:51 ` Timur Tabi
2025-02-19 1:21 ` Alexandre Courbot
2025-02-19 3:24 ` John Hubbard
2025-02-19 12:51 ` Alexandre Courbot
2025-02-19 20:22 ` John Hubbard
2025-02-19 20:23 ` Dave Airlie
2025-02-19 23:13 ` Daniel Almeida
2025-02-20 0:14 ` John Hubbard
2025-02-21 11:35 ` Alexandre Courbot
2025-02-21 12:31 ` Danilo Krummrich
2025-02-19 20:11 ` Sergio González Collado
2025-02-18 10:07 ` Dirk Behme
2025-02-18 13:07 ` Alexandre Courbot
2025-02-20 6:23 ` Dirk Behme
2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
2025-02-17 21:15 ` Daniel Almeida
2025-02-17 14:04 ` [PATCH RFC 3/3] gpu: nova-core: add basic timer device Alexandre Courbot
2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
2025-02-18 8:07 ` Greg KH
2025-02-18 13:23 ` Alexandre Courbot
2025-02-17 21:33 ` Danilo Krummrich
2025-02-18 1:46 ` Dave Airlie
2025-02-18 10:26 ` Danilo Krummrich
2025-02-19 12:58 ` Simona Vetter
2025-02-24 1:40 ` Alexandre Courbot
2025-02-24 12:07 ` Danilo Krummrich
2025-02-24 12:11 ` Danilo Krummrich
2025-02-24 18:45 ` Joel Fernandes
2025-02-24 23:44 ` Danilo Krummrich
2025-02-25 15:52 ` Joel Fernandes
2025-02-25 16:09 ` Danilo Krummrich
2025-02-25 21:02 ` Joel Fernandes
2025-02-25 22:02 ` Danilo Krummrich
2025-02-25 22:42 ` Dave Airlie
2025-02-25 22:57 ` Jason Gunthorpe
2025-02-25 23:26 ` Danilo Krummrich
2025-02-25 23:45 ` Danilo Krummrich
2025-02-26 0:49 ` Jason Gunthorpe
2025-02-26 1:16 ` Danilo Krummrich
2025-02-26 17:21 ` Jason Gunthorpe
2025-02-26 21:31 ` Danilo Krummrich
2025-02-26 23:47 ` Jason Gunthorpe
2025-02-27 0:41 ` Boqun Feng
2025-02-27 14:46 ` Jason Gunthorpe
2025-02-27 15:18 ` Boqun Feng
2025-02-27 16:17 ` Jason Gunthorpe
2025-02-27 16:55 ` Boqun Feng
2025-02-27 17:32 ` Danilo Krummrich
2025-02-27 19:23 ` Jason Gunthorpe
2025-02-27 21:25 ` Boqun Feng
2025-02-27 22:00 ` Jason Gunthorpe
2025-02-27 22:40 ` Danilo Krummrich
2025-02-28 18:55 ` Jason Gunthorpe
2025-03-03 19:36 ` Danilo Krummrich
2025-03-03 21:50 ` Jason Gunthorpe
2025-03-04 9:57 ` Danilo Krummrich
2025-02-27 1:02 ` Greg KH
2025-02-27 1:34 ` John Hubbard
2025-02-27 21:42 ` Dave Airlie
2025-02-27 23:06 ` John Hubbard
2025-02-28 4:10 ` Dave Airlie
2025-02-28 18:50 ` Jason Gunthorpe
2025-02-28 10:52 ` Simona Vetter
2025-02-28 18:40 ` Jason Gunthorpe
2025-03-04 16:10 ` Simona Vetter
2025-03-04 16:42 ` Jason Gunthorpe
2025-03-05 7:30 ` Simona Vetter
2025-03-05 15:10 ` Jason Gunthorpe
2025-03-06 10:42 ` Simona Vetter
2025-03-06 15:32 ` Jason Gunthorpe
2025-03-07 10:28 ` Simona Vetter
2025-03-07 12:32 ` Jason Gunthorpe
2025-03-07 13:09 ` Simona Vetter
2025-03-07 14:55 ` Jason Gunthorpe
2025-03-13 14:32 ` Simona Vetter
2025-03-19 17:21 ` Jason Gunthorpe
2025-03-21 10:35 ` Simona Vetter
2025-03-21 12:04 ` Jason Gunthorpe
2025-03-21 12:12 ` Danilo Krummrich
2025-03-21 17:49 ` Jason Gunthorpe
2025-03-21 18:54 ` Danilo Krummrich
2025-03-07 14:00 ` Greg KH
2025-03-07 14:46 ` Jason Gunthorpe
2025-03-07 15:19 ` Greg KH
2025-03-07 15:25 ` Jason Gunthorpe
2025-02-27 14:23 ` Jason Gunthorpe
2025-02-27 11:32 ` Danilo Krummrich
2025-02-27 15:07 ` Jason Gunthorpe
2025-02-27 16:51 ` Danilo Krummrich
2025-02-25 14:11 ` Alexandre Courbot
2025-02-25 15:06 ` Danilo Krummrich
2025-02-25 15:23 ` Alexandre Courbot
2025-02-25 15:53 ` Danilo Krummrich [this message]
2025-02-27 21:37 ` Dave Airlie
2025-02-28 1:49 ` Timur Tabi
2025-02-28 2:24 ` Dave Airlie
2025-02-18 13:35 ` Alexandre Courbot
2025-02-18 1:42 ` Dave Airlie
2025-02-18 13:47 ` Alexandre Courbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z73nYKsq14Pf6ucp@cassiopeiae \
--to=dakr@kernel.org \
--cc=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=boqun.feng@gmail.com \
--cc=bskeggs@nvidia.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nouveau-bounces@lists.freedesktop.org \
--cc=nouveau@lists.freedesktop.org \
--cc=rust-for-linux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).