From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 165D8215F42; Tue, 2 Jun 2026 03:35:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780371348; cv=none; b=ftae6BmNZfdsnbbPVBOg6flZfOoyIqyuLqbQRGZi2i7L5j6y2ITheRlbt535FUL4YKAPM3RS4QuPe9YkZiOQNgN/voCMiNhdV6BC4jLRG8xuPU+BVKqS29PeRy9hp6FZ0qMwCwa67G7MSH3Lq28+PMWlgM6qA37m1Zz6TTlkqFY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780371348; c=relaxed/simple; bh=xM2NJ8L8qxwWOw24vmapc1HI8lekTdMkTrTgsdocu7M=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=uXZMdeunVfnDhaH4++nK0/0gggTvT0+kXf1eobdEMIzB4mASBmdauu5tpK6pmRqSvRkHwVfPP4xoJP7WLJWEip5ok6iQ+amXrQam6VNg6KMQifFifm3JVvbSvjhHh7ZyMJXeirAilCa6XGWMOoFNhQrona2ZH34MdSPR9aJYinQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=a1Pp9QQx; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="a1Pp9QQx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 989261F00893; Tue, 2 Jun 2026 03:35:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780371346; bh=JnPWzDXJ6J5JXVUFZ8dg6eypsytyQJ6Cf7Ig0NSSUYI=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=a1Pp9QQx26cXseWMpTOBo8PUiYqPFv65JEH7MVjlwTx7yVAwRCuOeF5+Voq5tGAsr sujr0oGJHmOodWC/jdUgTHqfoE7QRNyRC8zs6rhOaHniG3moBO/+uSt5y1NsWoMU1n W+z4m9rR/eVO1qwTrFCdT6jCkYhWmUCpHMXv994jAYoy9mGmiE+QnbQlott4j1ThOg N+d8JBex2LCFOmYHmYf7+UrYK8VkVc6ISkSvefwYVlqqX+IFJjXrd8bzVCABq8ldAW 24waH/cohr0mVSpxERTXxJ85uisejDdUWHtelgLFS9b5ZNh4aknTFTfgKPvo39zY5T Ftuu/tM7cgHpw== From: sashiko-bot@kernel.org Subject: Re: [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging Reply-To: sashiko-reviews@lists.linux.dev To: "Alexandre Courbot" , "Danilo Krummrich" , "John Hubbard" Cc: "Alistair Popple" , "Alexandre Courbot" , "Simona Vetter" , "Bjorn Helgaas" , "Miguel Ojeda" , "Danilo Krummrich" , "Alice Ryhl" , =?utf-8?b?QmrDtnJuIFJveSBCYXJvbg==?= , "Gary Guo" , "Trevor Gross" , nova-gpu@lists.linux.dev, "John Hubbard" , "Zhi Wang" , "Eliot Courtney" , "David Airlie" , "Shashank Sharma" , "Timur Tabi" , "Benno Lossin" , LKML , "Boqun Feng" , "Alex Gaynor" , "Andreas Hindborg" In-Reply-To: <20260602032111.224790-18-jhubbard@nvidia.com> References: <20260602032111.224790-18-jhubbard@nvidia.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 02 Jun 2026 03:35:45 +0000 Message-Id: <20260602033545.989261F00893@smtp.kernel.org> Precedence: bulk X-Mailing-List: nova-gpu@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 3 potential issue(= s) to consider: - [High] Unbounded kernel memory allocation based on untrusted hardware inp= ut. - [High] FSP message synchronization ignores `task_id` and mishandles timeo= uts, leading to permanent message desynchronization. - [Medium] FSP protocol structures use native integer types and are directl= y transmuted from byte streams, breaking big-endian compatibility. -- commit 4be2156ade2cecdc1c8a1f092241ae18c134074e Author: John Hubbard gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging This commit introduces the synchronous send-and-wait path for FSP messaging in the nova-core driver. It defines FSP protocol structures and a send_sync_fsp method to exchange messages with the FSP hardware. > diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs > index f3524137d9f78..67cf83aba83c6 100644 > --- a/drivers/gpu/nova-core/fsp.rs > +++ b/drivers/gpu/nova-core/fsp.rs [ ... ] > @@ -22,18 +26,56 @@ [ ... ] > +/// FSP command response payload (NVDM_PAYLOAD_COMMAND_RESPONSE). > +#[repr(C, packed)] > +#[derive(Clone, Copy)] > +struct NvdmPayloadCommandResponse { > + task_id: u32, > + command_nvdm_type: u32, > + error_code: u32, > +} [Severity: Medium] Will this structure parse values incorrectly on big-endian architectures? The FSP hardware writes responses in a fixed endianness (little-endian), but this uses native u32 types. When FspResponse::from_bytes_prefix transmutes the byte stream, it could result in corrupted values and spurious validation failures like command_nvdm_type !=3D M::NVDM_TYPE on big-endian systems. Should these use explicit little-endian types instead? [ ... ] > @@ -70,4 +112,79 @@ pub(crate) fn wait_secure_boot( [ ... ] > + fn send_sync_fsp(&mut self, dev: &device::Device, bar: &Bar0, msg= : &M) -> Result > + where > + M: MessageToFsp, > + { [ ... ] > + let packet_size =3D num::u32_as_usize(packet_size); > + let mut response_buf =3D KVec::::new(); > + response_buf.resize(packet_size, 0, GFP_KERNEL)?; [Severity: High] Could this trigger an unbounded kernel memory allocation? The packet_size is derived from poll_msgq(), which calculates the size based on MMIO reads of the FSP queue head and tail pointers. If an errant or compromised FSP firmware returns a large value, passing this unvalidated size directly to resize() with GFP_KERNEL could trigger the OOM killer or cause severe system stutter. Should this size be clamped to a hardware-defined maximum, such as the FSP EMEM capacity, before allocation? [ ... ] > + if command_nvdm_type !=3D M::NVDM_TYPE { > + dev_err!( > + dev, > + "Expected NVDM type {:#x} in reply, got {:#x}\n", > + M::NVDM_TYPE, > + command_nvdm_type > + ); > + return Err(EIO); > + } [Severity: High] Does this validation logic leave the driver vulnerable to message desynchronization after a timeout? If send_sync_fsp times out waiting for a response, it returns ETIMEDOUT without draining the message queue. If the FSP firmware completes the request later, its response remains at the queue's tail. A subsequent call to send_sync_fsp will immediately read this delayed response. Because this validation only checks command_nvdm_type and ignores the task_id (sequence number), the driver might incorrectly accept the stale response as the reply to the new request if they share the same NVDM type. Should the validation also check task_id against the sent request to reject stale responses? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260602032111.2247= 90-1-jhubbard@nvidia.com?part=3D17