From: Alistair Popple <apopple@nvidia.com>
To: Gary Guo <gary@garyguo.net>
Cc: Eliot Courtney <ecourtney@nvidia.com>,
Danilo Krummrich <dakr@kernel.org>,
Alexandre Courbot <acourbot@nvidia.com>,
Alice Ryhl <aliceryhl@google.com>,
David Airlie <airlied@gmail.com>,
Simona Vetter <simona@ffwll.ch>,
Benno Lossin <lossin@kernel.org>,
John Hubbard <jhubbard@nvidia.com>,
Timur Tabi <ttabi@nvidia.com>,
nova-gpu@lists.linux.dev, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org
Subject: Re: [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues
Date: Tue, 16 Jun 2026 17:57:37 +1000 [thread overview]
Message-ID: <ajD-8-bp6Usjl56R@nvdebian.thelocal> (raw)
In-Reply-To: <DJ9SJCIX1XUO.RTFCL56MKU0D@garyguo.net>
On 2026-06-16 at 03:15 +1000, Gary Guo <gary@garyguo.net> wrote...
> On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
> > Currently, `poll_msgq` will report a message of size 4 if the queue
> > pointers are broken. It's easy to catch this if it occurs, so have
> > `poll_msgq` return an error in this case.
> >
> > Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> > ---
> > drivers/gpu/nova-core/falcon/fsp.rs | 15 +++++++++------
> > 1 file changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> > index e7419a6e71e2..21eaa8e261ce 100644
> > --- a/drivers/gpu/nova-core/falcon/fsp.rs
> > +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> > @@ -107,19 +107,22 @@ fn read_emem(&mut self, bar: Bar0<'_>, data: &mut [u8]) -> Result {
> > /// Poll FSP for incoming data.
> > ///
> > /// Returns the size of available data in bytes, or 0 if no data is available.
> > + /// Returns an error if the queue pointers are bogus (`tail < head`).
> > ///
> > /// The FSP message queue is not circular. Pointers are reset to 0 after each
> > /// message exchange, so `tail >= head` is always true when data is present.
> > - fn poll_msgq(&self, bar: Bar0<'_>) -> u32 {
> > + fn poll_msgq(&self, bar: Bar0<'_>) -> Result<u32> {
> > let head = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
> > let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();
> >
> > if head == tail {
> > - return 0;
> > + Ok(0)
> > + } else {
> > + // TAIL points at the last DWORD written, so the size is `tail - head + 4`.
> > + tail.checked_sub(head)
> > + .and_then(|delta| delta.checked_add(4))
> > + .ok_or(EIO)
>
> Whenever we fail with this, we should print a message (actually, the same thing
> probably should be done for patch 1 as well).
>
> A plain EIO is going be very difficult to troubleshoot if this is ever hit.
I don't disagree with the sentiment - this is a problem through out the kernel
and I have spent way too long tracing where exactly error codes have come from
both in C and Rust.
But it seems odd to worry about these particular instances - they _should_
never happen or at least be extremely rare and very unlikely by an end-user.
More to the point though there are many other places in nova-core (and I'm
sure other drivers) where this pattern of just returning a fairly generic
error code exists. So it feels like it would be nicer to deal with this at some
other layer, eg. some kind of debug option to tag error codes with location
or something.
So I'm not opposed to the comment, but maybe it would be better addressed as a
separate question/patch series to figure out how to do this error reporting in a
more generic or consistent way across all of Nova at least?
- Alistair
> Best,
> Gary
>
> > }
> > -
> > - // TAIL points at last DWORD written, so add 4 to get total size.
> > - tail.saturating_sub(head).saturating_add(4)
> > }
> >
> > /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
> > @@ -154,7 +157,7 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
> > pub(crate) fn recv_msg(&mut self, bar: Bar0<'_>) -> Result<KVec<u8>> {
> > let result = (|| {
> > let msg_size = read_poll_timeout(
> > - || Ok(self.poll_msgq(bar)),
> > + || self.poll_msgq(bar),
> > |&size| size > 0,
> > Delta::from_millis(10),
> > Delta::from_millis(FSP_MSG_TIMEOUT_MS),
>
>
next prev parent reply other threads:[~2026-06-16 7:57 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
2026-06-15 17:11 ` Gary Guo
2026-06-16 7:33 ` Alistair Popple
2026-06-15 14:40 ` [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues Eliot Courtney
2026-06-15 17:15 ` Gary Guo
2026-06-16 7:57 ` Alistair Popple [this message]
2026-06-15 14:40 ` [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel Eliot Courtney
2026-06-15 17:16 ` Gary Guo
2026-06-15 14:40 ` [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants Eliot Courtney
2026-06-15 17:17 ` Gary Guo
2026-06-16 8:02 ` Alistair Popple
2026-06-15 14:40 ` [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error Eliot Courtney
2026-06-15 17:23 ` Gary Guo
2026-06-15 14:40 ` [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot Eliot Courtney
2026-06-15 17:24 ` Gary Guo
2026-06-15 14:40 ` [PATCH 07/13] gpu: nova-core: gsp: ensure lifetime for FMC boot DMA allocations Eliot Courtney
2026-06-15 14:40 ` [PATCH 08/13] gpu: nova-core: gsp: ensure LibOS DMA allocation lives long enough Eliot Courtney
2026-06-15 14:40 ` [PATCH 09/13] gpu: nova-core: wait for FSP boot earlier Eliot Courtney
2026-06-15 14:40 ` [PATCH 10/13] gpu: nova-core: split FbLayout into FSP and non-FSP versions Eliot Courtney
2026-06-15 14:40 ` [PATCH 11/13] gpu: nova-core: correct FRTS vidmem offset calculation Eliot Courtney
2026-06-15 14:40 ` [PATCH 12/13] gpu: nova-core: rename heap size field Eliot Courtney
2026-06-15 14:40 ` [PATCH 13/13] gpu: nova-core: return non-WPR heap size as u64 from HALs Eliot Courtney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajD-8-bp6Usjl56R@nvdebian.thelocal \
--to=apopple@nvidia.com \
--cc=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=aliceryhl@google.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=ecourtney@nvidia.com \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=nova-gpu@lists.linux.dev \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=ttabi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox