NVIDIA GPU driver infrastructure
 help / color / mirror / Atom feed
From: "Alexandre Courbot" <acourbot@nvidia.com>
To: "Timur Tabi" <ttabi@nvidia.com>
Cc: "gary@garyguo.net" <gary@garyguo.net>,
	"nova-gpu@lists.linux.dev" <nova-gpu@lists.linux.dev>,
	"Zhi Wang" <zhiw@nvidia.com>, "dakr@kernel.org" <dakr@kernel.org>,
	"Eliot Courtney" <ecourtney@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>
Subject: Re: [PATCH 3/8] gpu: nova-core: add TLV parser for firmware files
Date: Sat, 13 Jun 2026 08:27:28 +0900	[thread overview]
Message-ID: <DJ7GKT7LBOS3.13GU1WMZKQHDC@nvidia.com> (raw)
In-Reply-To: <23480c16b36fdb659021153c356bdb61424308a1.camel@nvidia.com>

On Sat Jun 13, 2026 at 3:04 AM JST, Timur Tabi wrote:
<...>
>> > +
>> > +``BLOB`` (bytes)
>> > +    If the firmware microcode binary is stored in the TLV, this tag contains
>> > +    the actual firmware image bytes.
>> > +
>> > +``FILE`` (string)
>> > +    If the firmware binary is stored as a separate file, this tag contains the
>> > +    name of that file, assumed to be in the same directory as the TLV.
>> > +    This tag is always paired with ``SIZE``, so as to allow the driver to
>> > +    pre-allocate the buffer before loading the file.
>> 
>> Note that `request_firmware` outrights reject paths that contain `..`.
>> These should probably be excluded from the allowed paths here as well.
>
> Ok.
>
> I guess that kills the idea of avoiding symlinks.

Pretty much. We could technically mangle the paths ourselves in the
kernel, but that's likely to be frowned upon.

>
>> We also need to specify what happens if we have a `FILE` tag but no
>> `SIZE` or vice-versa (presumably, this should be an error?).
>
> Technically, it's the caller that decides what to do in each case.  There is no central "tag
> checker" that verifies any of this.  I felt that this document should be limited to describing
> how the tags work if you see one, rather than describing how to actually build a "proper" TLV
> file.  No one is expected to build the TLVs outside of the extract_firmware_nova.py script.

Since this is a file format specification, I'd argue it should on the
contrary try to pin down what is a valid TLV file and what is not.
Essentially so we can rely on it to make changes that have
backward-compatibility implications in the future.

>
> So to answer your question, if there's no SIZE tag, then it will need to use request_firmware()
> instead of request_firmware_into_buf().  I can add language that says that the SIZE tag should
> be created if it's at all possible to know the size when the TLV is built.

Yes, I think that would remove all ambiguity.

>
> I don't really have a strong opinion either way.
>
>> Another thing I think we should also formally specify: how are duplicate
>> tags handled?
>
> Well, currently the code will ignore the second copy of the tag, since the iterator always
> starts from the beginning.  I'll add some text that says duplicates are not allowed, but if they
> exist, only the first tag (in offset order) is used, and the others are ignored.

Sounds good to me.

>
>
>> 
>> > +
>> > +GSP Firmware Tags
>> > +=================
>> > +Used when loading the GSP (GPU System Processor) firmware
>> > +(``gsp.bin``).
>> 
>> Do you mean `gsp.tlv`? (same for other files).
>
> Hmmm.... I don't think so.  But it's obviously so unclear I should remove it.  I think the point
> I was trying to make is that this is the TLV that goes with gsp.bin.

I guess I was confused because `gsp.tlv` could technically point to a
different file through its `FILE` tag. The real anchor for the driver is
`gsp.tlv`.

<...>
>> > diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
>> > index 2749c196416d..888a426b7b41 100644
>> > --- a/drivers/gpu/nova-core/firmware.rs
>> > +++ b/drivers/gpu/nova-core/firmware.rs
>> 
>> Since the code for `Tlv` seems to be nicely self-contained, can you put
>> it into a sub-module (`firmware/tlv.rs`)? This will reduce the amount of
>> stuff in `firmware.rs` too.
>
> Ok.
>
> I was also thinking of merging request_tlv() into the Tlv::new() constructor.  Do you think
> that's a good idea?  We always precede Tlv::new() with request_tlv() anyway.  

I like the `request` wording because it implies more strongly that we
are loading something. Maybe a `Tlv::request` constructor?

Having `Tlv::new` take an array of bytes looks natural and separates
parsing from loading. It also enables things like nested TLV files
should we ever need that.

>
>> > @@ -11,7 +11,7 @@
>> >      device,
>> >      firmware,
>> >      prelude::*,
>> > -    str::CString,
>> > +    str::{CStr, CString},
>> >      transmute::FromBytes, //
>> >  };
>> >  
>> > @@ -684,3 +684,240 @@ pub(super) fn elf_section<'a>(elf: &'a [u8], name: &str) -> Option<&'a
>> > [u8]> {
>> >          }
>> >      }
>> >  }
>> > +
>> > +pub(crate) struct TlvBlock<'a> {
>> 
>> `TlvBlock` and its members can be private IIUC.
>
> Ok
>
>> 
>> > +    pub(crate) tag: &'a str,
>> 
>> This should probably be a `&'a [u8; 4]`? `str` is variable length and
>> technically UTF-8, using a 4-bytes array is the more accurate type.
>> Doing this also simplifies `parse` a bit.
>> 
>> > +    pub(crate) value: &'a [u8],
>> > +}
>> > +
>> > +/// On-wire TLV block header: 4-byte ASCII tag + little-endian payload length (bytes,
>> > excluding
>> > +/// padding to a 4-byte boundary).
>> > +struct TlvBlockHeader<'a> {
>> > +    tag: &'a str,
>> 
>> Same here.
>
> I made these as strings to simplify the iterator so that it can just compare strings, but I will
> take another look at the code.

Comparison of `[u8; 4]`s should be equally trivial to perform.

<...>
>> > +///
>> > +/// # Invariants
>> > +///
>> > +/// `pos` is a byte offset into `tlv.data` that always lies on a block boundary (in the
>> > sense
>> > +/// of the [`Tlv`] invariant): it is either the start of a well-formed block, or equal to
>> > +/// `tlv.data.len()` (end of iteration).
>> > +struct TlvIter<'tlv, 'a> {
>> > +    tlv: &'tlv Tlv<'a>,
>> > +    pos: usize,
>> > +}
>> > +
>> > +impl<'tlv, 'a> Iterator for TlvIter<'tlv, 'a> {
>> > +    type Item = TlvBlock<'a>;
>> > +
>> > +    /// Returns the block starting at `self.pos` and advances the cursor past it, or
>> > [`None`]
>> > +    /// once the cursor reaches the end of the data.
>> > +    ///
>> > +    /// The body relies on a number of `unsafe` operations, because [`Iterator::next`] can
>> > +    /// only return [`Option`], not [`Result`], so there is no way to report a parse error
>> > +    /// from here.  That is why the entire TLV stream is validated up front in the
>> > +    /// constructor.  By this type's invariants, `self.pos` is always on a block boundary
>> > +    /// into already-validated data, so each block is known to be well-formed, and
>> > +    /// therefore the unchecked operations cannot fail.
>> 
>> Second paragraph also leaks implementation details and is not relevant
>> to users.
>
> Ok.  I like to add these kinds of comments, because I figure that if it was hard for me to
> figure out, then others would probably appreciate it.  I wish the Nouveau code had been
> documented like that.

It's fine to turn these into regular code comments if you think they are
useful (although in this case it looks like we are heading towards
removing the unsafe statements anyway). It's just that they don't bring
information that users will find directly useful and would justify them
being doccomments.

>
>> > +    fn next(&mut self) -> Option<Self::Item> {
>> > +        if self.pos >= self.tlv.data.len() {
>> > +            return None;
>> > +        }
>> > +
>> > +        let tail = &self.tlv.data[self.pos..];
>> > +
>> > +        // SAFETY: `self.pos` is on a block boundary (`TlvIter` invariant) and the check
>> > +        // above gives `self.pos < data.len()`, so by the `Tlv` invariant a complete
>> > +        // well-formed block starts at `tail`.
>> > +        let hdr = unsafe { tail.get_unchecked(..TlvBlockHeader::SIZE) };
>> > +
>> > +        // SAFETY: same header bytes as validated in `Tlv::new` for this offset.
>> > +        let header = unsafe { TlvBlockHeader::parse(hdr).unwrap_unchecked() };
>> > +
>> > +        let stored_size = header.length.next_multiple_of(4);
>> > +        let advance = TlvBlockHeader::SIZE + stored_size;
>> > +        let payload_end = TlvBlockHeader::SIZE + header.length;
>> > +
>> > +        // SAFETY: `advance` and `payload_end` are exactly the stored and logical payload
>> > extents
>> > +        // `Tlv::new` accepted for this block.
>> > +        let value = unsafe {
>> > +            let block = tail.get_unchecked(..advance);
>> > +            block.get_unchecked(TlvBlockHeader::SIZE..payload_end)
>> > +        };
>> 
>> As Danilo mentioned we should get rid of these `unsafe` blocks. I think
>> returning `None` if an error is met is fine: an error is, technically,
>> the end of the iteration; and we have validated the whole file in the
>> constructor, so they won't happen anyway.
>
> Ok.  I really don't like conflating Option with Result, though.  I figured in this case, it
> would be better to eliminate any validation code that will never execute, but no one seems to
> mind so I'll change it.  I'll just add a comment that technically this iterator will return None
> on errors that won't never actually occur because the TLV is validated.

Yeah, it's a tradeoff really. Ideally we would be able to represent the
validation done by the constructor using the type system and rely on
that to make the iterator safe and infallible. Here it looks like we
cannot, so the options are doing redundant error handling, panicking, or
unsafe code.

Unsafe code looks out of place in a parser that is not in any critical
path. I am not a fan of panicking either as it adds a landmine just
because nobody is ever supposed to step there and makes it harder to
prove that the code is indeed safe. This leaves error handling as the
least bad of the 3 options.

But since meeting these errors would technically be a bug in the code,
we can at least log them using `pr_err!` so they don't go unnoticed.

<...>
>> > +#[allow(dead_code)]
>> > +impl<'a> Tlv<'a> {
>> > +    const MAGIC: &'static [u8; 4] = b"NVFW";
>> > +
>> > +    /// Parses `data` as a TLV firmware image, returning [`EINVAL`] if the image is
>> > malformed.
>> > +    pub(crate) fn new(data: &'a [u8]) -> Result<Self> {
>> > +        // Verify that the magic bytes exist and are the correct value
>> > +        let magic_len = Self::MAGIC.len();
>> > +        if data
>> > +            .get(..magic_len)
>> > +            .is_none_or(|magic| magic != Self::MAGIC)
>> > +        {
>> > +            return Err(EINVAL);
>> > +        }
>> > +
>> > +        // The payload is the contiguous sequence of TLV blocks after the magic.
>> > +        let payload = data.get(magic_len..).ok_or(EINVAL)?;
>> > +
>> > +        let mut pos = 0usize;
>> 
>> You can work directly with `payload` and avoid storing the position in
>> its own variable. Just make `payload`'s declaration above `mut`, and
>> then:
>> 
>> > +        while pos < payload.len() {
>> 
>>     while !payload.is_empty() {
>> 
>
> Ok.  I thought it would be safer to make an integer mut instead of a slice.

The slice should be the safer option as it has less state to keep track
of.

Note that the `mut` keyword applies to the slice itself, not what it
points to - you still cannot modify the content of `payload`, you can
just assign a different slice to the `payload` variable.

>
>> > +            // Get the next TLV block.
>> > +            let Some(rest) = payload.get(pos..) else {
>> > +                return Err(EINVAL);
>> > +            };
>> 
>> This can disappear.
>
> Ok
>
>> 
>> > +            // Validate and extract the header (type, length).
>> > +            let Some(header) = rest
>> 
>> s/rest/payload
>
> Only if I make payload mut as you suggest, right?

Oh yes these comments are still about dropping `pos`.

<...>
>> > +        let tlv = self.iter().find(|b| b.tag == tag).ok_or(EINVAL)?;
>> > +
>> > +        Ok(tlv.value)
>> 
>> nit: can be collapsed into 
>> 
>>    self.iter().find(|b| b.tag == tag).map(|b| b.value).ok_or(EINVAL)
>
> I mean, yes it can be, but I don't see how that's an improvement.  That first line
>
> 	let tlv = self.iter().find(|b| b.tag == tag).ok_or(EINVAL)?;
>
> exists in each getter, for consistency.  So I can only "collapse it" in get_bytes, not in any of
> the others.

You can also chain more `map` and other operators. This gives a
functional vibe to the method which feels slightly more idiomatic in
Rust, but it is not critical if you prefer to keep it procedural. Both
read fine for something that short.

>
>
>
>> 
>> > +    }
>> > +
>> > +    pub(crate) fn get_u32(&self, tag: &str) -> Result<u32> {
>> > +        let tlv = self.iter().find(|b| b.tag == tag).ok_or(EINVAL)?;
>> 
>> You can reuse `get_bytes` to avoid repeating this snippet (also applies
>> to the other `get_` methods).
>
> I would rather have a private "find_tlv" method, so that all the getters are independent.

That would work as well.

>
>> > +
>> > +        tlv.value
>> > +            .try_into()
>> > +            .ok()
>> > +            .map(u32::from_le_bytes)
>> > +            .ok_or(EINVAL)
>> > +    }
>> > +
>> > +    pub(crate) fn get_string(&self, tag: &str) -> Result<&'a str> {
>> > +        let tlv = self.iter().find(|b| b.tag == tag).ok_or(EINVAL)?;
>> > +
>> > +        // Handle the possibility that the value is null-terminated.
>> > +        let bytes = match CStr::from_bytes_until_nul(tlv.value) {
>> > +            Ok(cstr) => cstr.to_bytes(),
>> > +            Err(_) => tlv.value,
>> > +        };
>> 
>> The spec says that strings are expected to be ASCII without a NULL
>> terminator - and the NULL terminator does not carry any meaning in Rust.
>> So I don't think we should be doing that check.
>
> Ok.  I thought I was being crafty here.
>
>
>> > +
>> > +        // But do require it to be all ASCII.
>> > +        if !bytes.is_ascii() {
>> > +            return Err(EINVAL);
>> > +        }
>> 
>> Btw, is it bad if we allow strings to be UTF-8?
>
> Depends on your definition of "bad" I guess.  printk doesn't have any real UTF-8 support, it
> just kinda works if you're careful.  I added this check because string tags are meant to be
> things like file names and version strings, and those things really should not be UTF-8.  Multi-
> byte characters cause all sorts of problems.  We're not using TLV to store people's names.

Mmm indeed there is probably no real benefit here.

  reply	other threads:[~2026-06-12 23:27 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10 17:49 [PATCH 0/8] Transition Nova Core to TLV firmware images Timur Tabi
2026-06-10 17:49 ` [PATCH 1/8] rust: firmware: add request_into_buf() Timur Tabi
2026-06-10 18:03   ` Gary Guo
2026-06-10 18:24     ` Timur Tabi
2026-06-10 20:26       ` Gary Guo
2026-06-10 20:28         ` Timur Tabi
2026-06-10 21:46         ` Danilo Krummrich
2026-06-11  4:58       ` Alexandre Courbot
2026-06-11 15:48         ` Timur Tabi
2026-06-11  6:49   ` Alexandre Courbot
2026-06-11 16:32     ` Timur Tabi
2026-06-11 19:18       ` Danilo Krummrich
2026-06-11 19:28         ` Timur Tabi
2026-06-11 19:31           ` John Hubbard
2026-06-11 20:01             ` Timur Tabi
2026-06-11 21:55               ` John Hubbard
2026-06-11 21:56               ` Danilo Krummrich
2026-06-11 21:49           ` Danilo Krummrich
2026-06-12  2:59           ` Alexandre Courbot
2026-06-12 14:51   ` Alexandre Courbot
2026-06-10 17:49 ` [PATCH 2/8] gpu: nova-core: add request_tlv to load TLV images Timur Tabi
2026-06-10 22:00   ` Danilo Krummrich
2026-06-11 16:45     ` Timur Tabi
2026-06-11 18:17       ` Gary Guo
2026-06-11 18:26         ` Timur Tabi
2026-06-11 18:42           ` Gary Guo
2026-06-11 18:53             ` Timur Tabi
2026-06-11 19:16               ` John Hubbard
2026-06-11 18:53       ` Danilo Krummrich
2026-06-11 18:57         ` Timur Tabi
2026-06-10 17:49 ` [PATCH 3/8] gpu: nova-core: add TLV parser for firmware files Timur Tabi
2026-06-10 22:37   ` Danilo Krummrich
2026-06-11  6:51     ` Alexandre Courbot
2026-06-11 18:40       ` John Hubbard
2026-06-11 23:29     ` Timur Tabi
2026-06-12 14:46   ` Alexandre Courbot
2026-06-12 18:04     ` Timur Tabi
2026-06-12 23:27       ` Alexandre Courbot [this message]
2026-06-12 23:39     ` John Hubbard
2026-06-10 17:49 ` [PATCH 4/8] gpu: nova-core: transition booter_load to TLV images Timur Tabi
2026-06-10 17:49 ` [PATCH 5/8] gpu: nova-core: transition gsp " Timur Tabi
2026-06-10 17:49 ` [PATCH 6/8] gpu: nova-core: transition gen_bootloader " Timur Tabi
2026-06-10 17:49 ` [PATCH 7/8] gpu: nova-core: transition fsp " Timur Tabi
2026-06-10 17:49 ` [PATCH 8/8] gpu: nova-core: update firmware module info for " Timur Tabi
2026-06-12 15:02   ` Alexandre Courbot
2026-06-12 15:44     ` Timur Tabi
2026-06-10 18:16 ` [PATCH 0/8] Transition Nova Core to TLV firmware images John Hubbard
2026-06-10 18:19   ` Timur Tabi
2026-06-10 18:59     ` Timur Tabi
2026-06-10 21:21       ` John Hubbard
2026-06-10 21:43         ` Timur Tabi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DJ7GKT7LBOS3.13GU1WMZKQHDC@nvidia.com \
    --to=acourbot@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=ecourtney@nvidia.com \
    --cc=gary@garyguo.net \
    --cc=jhubbard@nvidia.com \
    --cc=nova-gpu@lists.linux.dev \
    --cc=ttabi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox