From: Danilo Krummrich <dakr@kernel.org>
To: Alexandre Courbot <acourbot@nvidia.com>
Cc: Dave Airlie <airlied@gmail.com>, Gary Guo <gary@garyguo.net>,
Joel Fernandes <joel@joelfernandes.org>,
Boqun Feng <boqun.feng@gmail.com>,
John Hubbard <jhubbard@nvidia.com>,
Ben Skeggs <bskeggs@nvidia.com>,
linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
Nouveau <nouveau-bounces@lists.freedesktop.org>
Subject: Re: [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation
Date: Tue, 25 Feb 2025 16:53:04 +0100 [thread overview]
Message-ID: <Z73nYKsq14Pf6ucp@cassiopeiae> (raw)
In-Reply-To: <D81MP8Y5ME66.3SLPVNXERH1HU@nvidia.com>
On Wed, Feb 26, 2025 at 12:23:40AM +0900, Alexandre Courbot wrote:
> On Wed Feb 26, 2025 at 12:06 AM JST, Danilo Krummrich wrote:
> > On Tue, Feb 25, 2025 at 11:11:07PM +0900, Alexandre Courbot wrote:
> >> On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> >> > CC: Gary
> >> >
> >> > On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
> >> >> This inability to sleep while we are accessing registers seems very
> >> >> constraining to me, if not dangerous. It is pretty common to have
> >> >> functions intermingle hardware accesses with other operations that might
> >> >> sleep, and this constraint means that in such cases the caller would
> >> >> need to perform guard lifetime management manually:
> >> >>
> >> >> let bar_guard = bar.try_access()?;
> >> >> /* do something non-sleeping with bar_guard */
> >> >> drop(bar_guard);
> >> >>
> >> >> /* do something that might sleep */
> >> >>
> >> >> let bar_guard = bar.try_access()?;
> >> >> /* do something non-sleeping with bar_guard */
> >> >> drop(bar_guard);
> >> >>
> >> >> ...
> >> >>
> >> >> Failure to drop the guard potentially introduces a race condition, which
> >> >> will receive no compile-time warning and potentialy not even a runtime
> >> >> one unless lockdep is enabled. This problem does not exist with the
> >> >> equivalent C code AFAICT, which makes the Rust version actually more
> >> >> error-prone and dangerous, the opposite of what we are trying to achieve
> >> >> with Rust. Or am I missing something?
> >> >
> >> > Generally you are right, but you have to see it from a different perspective.
> >> >
> >> > What you describe is not an issue that comes from the design of the API, but is
> >> > a limitation of Rust in the kernel. People are aware of the issue and with klint
> >> > [1] there are solutions for that in the pipeline, see also [2] and [3].
> >> >
> >> > [1] https://rust-for-linux.com/klint
> >> > [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> >> > [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/
> >>
> >> Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
> >> its own admission. But even if the ignore the safety issue, the other one
> >> (ergonomics) is still there.
> >>
> >> Basically this way of accessing registers imposes quite a mental burden on its
> >> users. It requires a very different (and harsher) discipline than when writing
> >> the same code in C
> >
> > We need similar solutions in C too, see drm_dev_enter() / drm_dev_exit() and
> > drm_dev_unplug().
>
> Granted, but the use of these is much more coarsed-grained than what is
> expected of IO resources, right?
Potentially, yes. But exactly this characteristic has been criticised [1].
[1] https://lore.kernel.org/nouveau/Z7XVfnnrRKrtQbB6@phenom.ffwll.local/
>
> >
> >> and the correct granularity to use is unclear to me.
> >>
> >> For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
> >> particular register in a busy loop, should I call try_access() once before the
> >> loop? Or every time before accessing the register?
> >
> > I think we should re-acquire the guard in each iteration and drop it before the
> > delay. I think a simple closure would work very well for this pattern?
> >
> >> I'm afraid having to check
> >> that the resource is still alive before accessing any register is going to
> >> become tedious very quickly.
> >>
> >> I understand that we want to protect against accessing the IO region of an
> >> unplugged device ; but still there is no guarantee that the device won't be
> >> unplugged in the middle of a critical section, however short. Thus the driver
> >> code should be able to recognize that the device has fallen off the bus when it
> >> e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
> >> extra protection that AFAICT isn't used in C?
> >
> > As mentioned above, we already do similar things in C.
> >
> > Also, think about what's the alternative. If we remove the Devres wrapper of
> > pci::Bar, we lose the control over the lifetime of the bar mapping and it can
> > easily out-live the device / driver binding. This makes the API unsound.
>
> Oh my issue is not with the Devres wrapper, I think it makes sense -
> it's more the use of RCU to control access to the resource that I find
> too constraining. And I'm pretty sure there will be more users of the
> same opinion as more drivers using it get written.
What do you suggest?
>
> >
> > With this drivers would be able to keep resources acquired. What if after a
> > hotplug the physical address region is re-used and to be mapped by another
> > driver?
>
> Actually - wouldn't that issue also be addressed by a PCI equivalent to
> drm_dev_enter() and friends that ensures the device (and thus its
> devres resources) stay in place?
I'm not sure I get the idea, but we can *not* have the device resources stay in
place once the device is unbound (e.g. keep the resource region acquired by the
driver).
Consequently, we have to have a way to revoke access to the corresponding
pci::Bar.
>
> Using Rust, I can imagine (but not picture precisely yet) some method of
> the device that returns a reference to an inner structure containing its
> resources, available with immediate access. Since it would be
> coarser-grained, it could rely on something less constraining than RCU
> without a noticeable performance penalty.
We had similar attempts when we designed this API, i.e. a common Revocable in
the driver private data of a device. But it had some chicken-egg issues on
initialization in probe(). Besides that, it wouldn't get you rid of the
Revocable, since the corresponding resource are only valid while the driver is
bound to a device, not for the entire lifetime of the device.
next prev parent reply other threads:[~2025-02-25 15:53 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-17 14:04 [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Alexandre Courbot
2025-02-17 14:04 ` [PATCH RFC 1/3] rust: add useful ops for u64 Alexandre Courbot
2025-02-17 20:47 ` Sergio González Collado
2025-02-17 21:10 ` Daniel Almeida
2025-02-18 13:16 ` Alexandre Courbot
2025-02-18 20:51 ` Timur Tabi
2025-02-19 1:21 ` Alexandre Courbot
2025-02-19 3:24 ` John Hubbard
2025-02-19 12:51 ` Alexandre Courbot
2025-02-19 20:22 ` John Hubbard
2025-02-19 20:23 ` Dave Airlie
2025-02-19 23:13 ` Daniel Almeida
2025-02-20 0:14 ` John Hubbard
2025-02-21 11:35 ` Alexandre Courbot
2025-02-21 12:31 ` Danilo Krummrich
2025-02-19 20:11 ` Sergio González Collado
2025-02-18 10:07 ` Dirk Behme
2025-02-18 13:07 ` Alexandre Courbot
2025-02-20 6:23 ` Dirk Behme
2025-02-17 14:04 ` [PATCH RFC 2/3] rust: make ETIMEDOUT error available Alexandre Courbot
2025-02-17 21:15 ` Daniel Almeida
2025-02-17 14:04 ` [PATCH RFC 3/3] gpu: nova-core: add basic timer device Alexandre Courbot
2025-02-17 15:48 ` [RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation Simona Vetter
2025-02-18 8:07 ` Greg KH
2025-02-18 13:23 ` Alexandre Courbot
2025-02-17 21:33 ` Danilo Krummrich
2025-02-18 1:46 ` Dave Airlie
2025-02-18 10:26 ` Danilo Krummrich
2025-02-19 12:58 ` Simona Vetter
2025-02-24 1:40 ` Alexandre Courbot
2025-02-24 12:07 ` Danilo Krummrich
2025-02-24 12:11 ` Danilo Krummrich
2025-02-24 18:45 ` Joel Fernandes
2025-02-24 23:44 ` Danilo Krummrich
2025-02-25 15:52 ` Joel Fernandes
2025-02-25 16:09 ` Danilo Krummrich
2025-02-25 21:02 ` Joel Fernandes
2025-02-25 22:02 ` Danilo Krummrich
2025-02-25 22:42 ` Dave Airlie
2025-02-25 22:57 ` Jason Gunthorpe
2025-02-25 23:26 ` Danilo Krummrich
2025-02-25 23:45 ` Danilo Krummrich
2025-02-26 0:49 ` Jason Gunthorpe
2025-02-26 1:16 ` Danilo Krummrich
2025-02-26 17:21 ` Jason Gunthorpe
2025-02-26 21:31 ` Danilo Krummrich
2025-02-26 23:47 ` Jason Gunthorpe
2025-02-27 0:41 ` Boqun Feng
2025-02-27 14:46 ` Jason Gunthorpe
2025-02-27 15:18 ` Boqun Feng
2025-02-27 16:17 ` Jason Gunthorpe
2025-02-27 16:55 ` Boqun Feng
2025-02-27 17:32 ` Danilo Krummrich
2025-02-27 19:23 ` Jason Gunthorpe
2025-02-27 21:25 ` Boqun Feng
2025-02-27 22:00 ` Jason Gunthorpe
2025-02-27 22:40 ` Danilo Krummrich
2025-02-28 18:55 ` Jason Gunthorpe
2025-03-03 19:36 ` Danilo Krummrich
2025-03-03 21:50 ` Jason Gunthorpe
2025-03-04 9:57 ` Danilo Krummrich
2025-02-27 1:02 ` Greg KH
2025-02-27 1:34 ` John Hubbard
2025-02-27 21:42 ` Dave Airlie
2025-02-27 23:06 ` John Hubbard
2025-02-28 4:10 ` Dave Airlie
2025-02-28 18:50 ` Jason Gunthorpe
2025-02-28 10:52 ` Simona Vetter
2025-02-28 18:40 ` Jason Gunthorpe
2025-03-04 16:10 ` Simona Vetter
2025-03-04 16:42 ` Jason Gunthorpe
2025-03-05 7:30 ` Simona Vetter
2025-03-05 15:10 ` Jason Gunthorpe
2025-03-06 10:42 ` Simona Vetter
2025-03-06 15:32 ` Jason Gunthorpe
2025-03-07 10:28 ` Simona Vetter
2025-03-07 12:32 ` Jason Gunthorpe
2025-03-07 13:09 ` Simona Vetter
2025-03-07 14:55 ` Jason Gunthorpe
2025-03-13 14:32 ` Simona Vetter
2025-03-19 17:21 ` Jason Gunthorpe
2025-03-21 10:35 ` Simona Vetter
2025-03-21 12:04 ` Jason Gunthorpe
2025-03-21 12:12 ` Danilo Krummrich
2025-03-21 17:49 ` Jason Gunthorpe
2025-03-21 18:54 ` Danilo Krummrich
2025-03-07 14:00 ` Greg KH
2025-03-07 14:46 ` Jason Gunthorpe
2025-03-07 15:19 ` Greg KH
2025-03-07 15:25 ` Jason Gunthorpe
2025-02-27 14:23 ` Jason Gunthorpe
2025-02-27 11:32 ` Danilo Krummrich
2025-02-27 15:07 ` Jason Gunthorpe
2025-02-27 16:51 ` Danilo Krummrich
2025-02-25 14:11 ` Alexandre Courbot
2025-02-25 15:06 ` Danilo Krummrich
2025-02-25 15:23 ` Alexandre Courbot
2025-02-25 15:53 ` Danilo Krummrich [this message]
2025-02-27 21:37 ` Dave Airlie
2025-02-28 1:49 ` Timur Tabi
2025-02-28 2:24 ` Dave Airlie
2025-02-18 13:35 ` Alexandre Courbot
2025-02-18 1:42 ` Dave Airlie
2025-02-18 13:47 ` Alexandre Courbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z73nYKsq14Pf6ucp@cassiopeiae \
--to=dakr@kernel.org \
--cc=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=boqun.feng@gmail.com \
--cc=bskeggs@nvidia.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nouveau-bounces@lists.freedesktop.org \
--cc=nouveau@lists.freedesktop.org \
--cc=rust-for-linux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.