All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Eliot Courtney" <ecourtney@nvidia.com>
Cc: dri-devel@lists.freedesktop.org, ojeda@kernel.org,
	"Alexandre Courbot" <acourbot@nvidia.com>,
	"Danilo Krummrich" <dakr@kernel.org>
Subject: Re: [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel
Date: Tue, 23 Jun 2026 20:06:47 +0000	[thread overview]
Message-ID: <20260623200647.A78AB1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260615-blackwell-fixes-v1-3-f2853e49ff7d@nvidia.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Polling queue pointers to attempt mutual exclusion between the kernel and GSP creates a Time-of-Check to Time-of-Use (TOCTOU) race condition and incorrectly identifies active processing windows as idle.

Pre-existing issues:
- [High] `send_msg` does not validate that the packet length is within the FSP EMEM channel boundaries before writing.
- [High] `QUEUE_TAIL` is written before `QUEUE_HEAD` is reset, potentially exposing an invalid queue state to the FSP.
--

commit cbbcd901d8b26aba0c2ec257efff6284b3bf260b
Author: Eliot Courtney <ecourtney@nvidia.com>

gpu: nova-core: fsp: try to enforce exclusive access to FSP channel
    
Currently, send_msg assumes that the channel to FSP is free to write
into. But, it might not be. Both the kernel driver and GSP communicate
with FSP. The way they should attempt to keep exclusive access to this
channel to FSP is by making sure they don't try to start writing if
there's pending data until the full round trip has finished.

> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
> index 21eaa8e261ce6..cdb476894e1ad 100644
> --- a/drivers/gpu/nova-core/falcon/fsp.rs
> +++ b/drivers/gpu/nova-core/falcon/fsp.rs

[ ... ]

> @@ -133,6 +153,9 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
>              return Err(EINVAL);
>          }
>  
> +        // Try to make sure we have exclusive access to the FSP at this point.
> +        self.wait_until_ready(bar)?;
> +
>          self.write_emem(bar, packet)?;

[Severity: High]
Does this introduce a time-of-check to time-of-use (TOCTOU) race condition?

The kernel and the GSP run asynchronously.

This lockless software polling approach does not appear to provide strict
mutual exclusion between the two processors.

Could both evaluate the idle condition (qhead == qtail && mhead == mtail)
simultaneously in wait_until_ready() and subsequently write to the shared
FSP EMEM channel at the same time, leading to hardware message corruption?

[Severity: High]
This isn't a bug introduced by this patch, but does this write operation lack
a bounds check against the FSP EMEM channel capacity?

There does not appear to be a check ensuring that packet.len() is within
FSP_EMEM_CHANNEL_0_SIZE (1024 bytes).

If a caller submits a larger packet, could write_emem() overwrite adjacent
hardware memory and potentially corrupt secure boot data or other channel
states?

>  
>          // Update queue pointers. TAIL points at the last DWORD written.
>          let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
>          bar.write(
>              Array::at(0),
>              regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset),
>          );
>          bar.write(
>              Array::at(0),
>              regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0),
>          );

[Severity: High]
This is a pre-existing issue, but does writing QUEUE_TAIL before resetting
QUEUE_HEAD expose an invalid queue state to the hardware?

If QUEUE_HEAD contains a non-zero value from a previous transaction, writing
QUEUE_TAIL first could cause the hardware to briefly observe a state where
TAIL < HEAD.

Because the queue is not circular, could this trigger an immediate hardware
processing error before the kernel updates QUEUE_HEAD to 0?

>  
>          Ok(())
>      }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260615-blackwell-fixes-v1-0-f2853e49ff7d@nvidia.com?part=3

  parent reply	other threads:[~2026-06-23 20:06 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
2026-06-15 17:11   ` Gary Guo
2026-06-16  7:33   ` Alistair Popple
2026-06-20 14:08   ` Alexandre Courbot
2026-06-22  9:11     ` Eliot Courtney
2026-06-15 14:40 ` [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues Eliot Courtney
2026-06-15 17:15   ` Gary Guo
2026-06-16  7:57     ` Alistair Popple
2026-06-16 10:57       ` Gary Guo
2026-06-17  3:51         ` Eliot Courtney
2026-06-17  4:31           ` Alistair Popple
2026-06-15 14:40 ` [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel Eliot Courtney
2026-06-15 17:16   ` Gary Guo
2026-06-17  2:55     ` Eliot Courtney
2026-06-17  3:12       ` John Hubbard
2026-06-23 20:06   ` sashiko-bot [this message]
2026-06-15 14:40 ` [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants Eliot Courtney
2026-06-15 17:17   ` Gary Guo
2026-06-16  8:02   ` Alistair Popple
2026-06-20 10:48   ` Alexandre Courbot
2026-06-15 14:40 ` [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error Eliot Courtney
2026-06-15 17:23   ` Gary Guo
2026-06-17  5:27     ` Eliot Courtney
2026-06-17 13:52       ` Gary Guo
2026-06-22  9:26         ` Eliot Courtney
2026-06-15 14:40 ` [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot Eliot Courtney
2026-06-15 17:24   ` Gary Guo
2026-06-20 10:48   ` Alexandre Courbot
2026-06-15 14:40 ` [PATCH 07/13] gpu: nova-core: gsp: ensure lifetime for FMC boot DMA allocations Eliot Courtney
2026-06-15 14:40 ` [PATCH 08/13] gpu: nova-core: gsp: ensure LibOS DMA allocation lives long enough Eliot Courtney
2026-06-17  6:11   ` Alistair Popple
2026-06-15 14:40 ` [PATCH 09/13] gpu: nova-core: wait for FSP boot earlier Eliot Courtney
2026-06-17 14:27   ` Alexandre Courbot
2026-06-24  1:21     ` Eliot Courtney
2026-06-15 14:40 ` [PATCH 10/13] gpu: nova-core: split FbLayout into FSP and non-FSP versions Eliot Courtney
2026-06-23 20:45   ` sashiko-bot
2026-06-15 14:40 ` [PATCH 11/13] gpu: nova-core: correct FRTS vidmem offset calculation Eliot Courtney
2026-06-15 14:40 ` [PATCH 12/13] gpu: nova-core: rename heap size field Eliot Courtney
2026-06-15 14:40 ` [PATCH 13/13] gpu: nova-core: return non-WPR heap size as u64 from HALs Eliot Courtney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623200647.A78AB1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=acourbot@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ecourtney@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.