From: "Eliot Courtney" <ecourtney@nvidia.com>
To: "Gary Guo" <gary@garyguo.net>,
"Eliot Courtney" <ecourtney@nvidia.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"Alexandre Courbot" <acourbot@nvidia.com>,
"Alice Ryhl" <aliceryhl@google.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Benno Lossin" <lossin@kernel.org>
Cc: "John Hubbard" <jhubbard@nvidia.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Timur Tabi" <ttabi@nvidia.com>, <nova-gpu@lists.linux.dev>,
<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>,
<rust-for-linux@vger.kernel.org>,
"dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Subject: Re: [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel
Date: Wed, 17 Jun 2026 11:55:45 +0900 [thread overview]
Message-ID: <DJAZIGJQ72A9.1688NC74RA4SY@nvidia.com> (raw)
In-Reply-To: <DJ9SKDFB5Z63.JPEV0D1Q60L@garyguo.net>
On Tue Jun 16, 2026 at 2:16 AM JST, Gary Guo wrote:
> On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
>> Currently, `send_msg` assumes that the channel to FSP is free to write
>> into. But, it might not be. Both the kernel driver and GSP communicate
>> with FSP. The way they should attempt to keep exclusive access to this
>> channel to FSP is by making sure they don't try to start writing if
>> there's pending data until the full round trip has finished.
>>
>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>> ---
>> drivers/gpu/nova-core/falcon/fsp.rs | 23 +++++++++++++++++++++++
>> 1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>> index 21eaa8e261ce..cdb476894e1a 100644
>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
>> @@ -125,6 +125,26 @@ fn poll_msgq(&self, bar: Bar0<'_>) -> Result<u32> {
>> }
>> }
>>
>> + /// Both the kernel driver and GSP talk to FSP. Try to ensure exclusive access to the FSP is
>> + /// enforced by making sure there is not a pending message already sent to FSP, and that there
>> + /// is no pending message from FSP to be read.
>> + fn wait_until_ready(&mut self, bar: Bar0<'_>) -> Result {
>> + read_poll_timeout(
>> + || {
>> + let qhead = bar.read(regs::NV_PFSP_QUEUE_HEAD::at(0)).address();
>> + let qtail = bar.read(regs::NV_PFSP_QUEUE_TAIL::at(0)).address();
>> + let mhead = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
>> + let mtail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();
>
> How does this prevent race between kernel and GSP when initiating FSP
> communcation?
You are right that it does not prevent it in the general case.
This is the logic that openrm uses and locally, much earlier, I observed
sometimes I did actually need this for reprobe to work. I think the
reason (but it's been a while) is that before we did not wait for GSP to
halt on unload and probe failure, which could mean there's a leftover
message from FSP that is not consumed if you reprobe quickly after a
failure. Now that we wait for GSP to halt it's much harder for this kind
of issue to occur. Actually, it might be impossible currently for this
kind of race to happen without a failure on unload (e.g. timeout of
waiting for GSP to reset).
The reason I thought to add this is more to match what appears to be
the protocol that this transport uses (even if it might not be sound
generally). I am curious what others think, if it's worth keeping this -
IMO it is since it does appear to be part of the way the communication
on the transport is meant to be done.
At least the comments could be better, since it looks like my hedging
"try to ensure" is confusing because it really doesn't in the general
case.
>
> Best,
> Gary
>
>> +
>> + Ok(qhead == qtail && mhead == mtail)
>> + },
>> + |&ready| ready,
>> + Delta::from_millis(10),
>> + Delta::from_millis(FSP_MSG_TIMEOUT_MS),
>> + )?;
>> + Ok(())
>> + }
>> +
>> /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
>> ///
>> /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
>> @@ -133,6 +153,9 @@ pub(crate) fn send_msg(&mut self, bar: Bar0<'_>, packet: &[u8]) -> Result {
>> return Err(EINVAL);
>> }
>>
>> + // Try to make sure we have exclusive access to the FSP at this point.
>> + self.wait_until_ready(bar)?;
>> +
>> self.write_emem(bar, packet)?;
>>
>> // Update queue pointers. TAIL points at the last DWORD written.
next prev parent reply other threads:[~2026-06-17 2:56 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 14:40 [PATCH 00/13] gpu: nova-core: blackwell follow-ups and fixes Eliot Courtney
2026-06-15 14:40 ` [PATCH 01/13] gpu: nova-core: fsp: limit FSP receive message allocation size Eliot Courtney
2026-06-15 17:11 ` Gary Guo
2026-06-16 7:33 ` Alistair Popple
2026-06-15 14:40 ` [PATCH 02/13] gpu: nova-core: fsp: catch bogus queue pointer issues Eliot Courtney
2026-06-15 17:15 ` Gary Guo
2026-06-16 7:57 ` Alistair Popple
2026-06-16 10:57 ` Gary Guo
2026-06-17 3:51 ` Eliot Courtney
2026-06-17 4:31 ` Alistair Popple
2026-06-15 14:40 ` [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel Eliot Courtney
2026-06-15 17:16 ` Gary Guo
2026-06-17 2:55 ` Eliot Courtney [this message]
2026-06-17 3:12 ` John Hubbard
2026-06-15 14:40 ` [PATCH 04/13] gpu: nova-core: falcon: gsp: move PRIV target mask constants Eliot Courtney
2026-06-15 17:17 ` Gary Guo
2026-06-16 8:02 ` Alistair Popple
2026-06-15 14:40 ` [PATCH 05/13] gpu: nova-core: gsp: keep FMC boot params DMA region alive during error Eliot Courtney
2026-06-15 17:23 ` Gary Guo
2026-06-17 5:27 ` Eliot Courtney
2026-06-15 14:40 ` [PATCH 06/13] gpu: nova-core: fsp: move FMC firmware loading into wait_secure_boot Eliot Courtney
2026-06-15 17:24 ` Gary Guo
2026-06-15 14:40 ` [PATCH 07/13] gpu: nova-core: gsp: ensure lifetime for FMC boot DMA allocations Eliot Courtney
2026-06-15 14:40 ` [PATCH 08/13] gpu: nova-core: gsp: ensure LibOS DMA allocation lives long enough Eliot Courtney
2026-06-17 6:11 ` Alistair Popple
2026-06-15 14:40 ` [PATCH 09/13] gpu: nova-core: wait for FSP boot earlier Eliot Courtney
2026-06-15 14:40 ` [PATCH 10/13] gpu: nova-core: split FbLayout into FSP and non-FSP versions Eliot Courtney
2026-06-15 14:40 ` [PATCH 11/13] gpu: nova-core: correct FRTS vidmem offset calculation Eliot Courtney
2026-06-15 14:40 ` [PATCH 12/13] gpu: nova-core: rename heap size field Eliot Courtney
2026-06-15 14:40 ` [PATCH 13/13] gpu: nova-core: return non-WPR heap size as u64 from HALs Eliot Courtney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DJAZIGJQ72A9.1688NC74RA4SY@nvidia.com \
--to=ecourtney@nvidia.com \
--cc=acourbot@nvidia.com \
--cc=airlied@gmail.com \
--cc=aliceryhl@google.com \
--cc=apopple@nvidia.com \
--cc=dakr@kernel.org \
--cc=dri-devel-bounces@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=gary@garyguo.net \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lossin@kernel.org \
--cc=nova-gpu@lists.linux.dev \
--cc=rust-for-linux@vger.kernel.org \
--cc=simona@ffwll.ch \
--cc=ttabi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox