[PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure

public inbox for rust-for-linux@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure
@ 2026-02-27 12:32 Eliot Courtney
  2026-02-27 12:32 ` [PATCH 1/9] gpu: nova-core: gsp: add NV_STATUS error code bindings Eliot Courtney
                   ` (8 more replies)
  0 siblings, 9 replies; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add the infrastructure for sending RM control RPCs. This is needed e.g.
for channel allocation.

This series adds:
- `NV_STATUS` bindings and wrapping `NvStatus` enum (used by RM control
  RPCs)
- The necessary bindings for the RM control RPCs.
- `RmControlMsgFunction` to identify individual control commands, like
  `MsgFunction` for GSP commands.
- `SBufferIter::flush_into_kvvec` for reading large RPC payloads
- A `send_rm_control` helper that sends a control and checks its
  NvStatus
- One usage of `send_rm_control`: the `CeGetFaultMethodBufferSize`
  RPC. This is useful for channel allocation later.

Each new RM control command can be added by extending
`RmControlMsgFunction`, adding the bindings and wrappers for their
parameters, and writing a type-safe wrapper to send and receive the
reply for the RM control rpc, using `send_rm_control`.

This series applies on latest drm-rust-next with the listed
pre-requisites. Alternatively, there is a branch with all dependency
commits included [1].

[1] https://github.com/Edgeworth/linux/tree/b4/rmcontrol

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
Eliot Courtney (9):
      gpu: nova-core: gsp: add NV_STATUS error code bindings
      gpu: nova-core: gsp: add NvStatus enum for RM control errors
      gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
      gpu: nova-core: gsp: add RM control RPC structure binding
      gpu: nova-core: gsp: add types for RM control RPCs
      gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
      gpu: nova-core: gsp: add RM control command infrastructure
      gpu: nova-core: gsp: add CE fault method buffer size bindings
      gpu: nova-core: gsp: add CeGetFaultMethodBufferSize RM control command

 drivers/gpu/nova-core/gsp.rs                      |   1 +
 drivers/gpu/nova-core/gsp/commands.rs             |  16 +
 drivers/gpu/nova-core/gsp/fw.rs                   | 402 ++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/fw/commands.rs          |  10 +
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 161 +++++++++
 drivers/gpu/nova-core/gsp/fw/rm.rs                |  99 ++++++
 drivers/gpu/nova-core/gsp/rm.rs                   |   3 +
 drivers/gpu/nova-core/gsp/rm/commands.rs          | 141 ++++++++
 drivers/gpu/nova-core/sbuffer.rs                  |  31 +-
 9 files changed, 860 insertions(+), 4 deletions(-)
---
base-commit: 4a49fe23e357b48845e31fe9c28a802c05458198
change-id: 20260225-rmcontrol-bd8a06fc3a0d
prerequisite-message-id: <20260226-cmdq-continuation-v3-0-572ab9916766@nvidia.com>
prerequisite-patch-id: fd45bc5b8eda5e2b54a052dddb1a1c363107f0a7
prerequisite-patch-id: d0f59ef489346e978a222755f9fb42dfe7af19e5
prerequisite-patch-id: 8844970d0e387488c70979a73732579ba174b46c
prerequisite-patch-id: e138a94ed48fa8d50d5ed1eb36524f98923ed478
prerequisite-patch-id: 4599a5e90d6665fa3acb7d4045de5d378ac28b4d
prerequisite-patch-id: 30ed64c398e541d6efbcb2e46ed9a9e6cf953f4f
prerequisite-patch-id: 9a965e9f29c8680c0b554e656ff8e9a1bfc67280
prerequisite-patch-id: d8cccc3dfb070f304805fc7e0f24121809b4b300
prerequisite-patch-id: c0a73dfd1fb630ab02486f0180b90f8fe850b4dc
prerequisite-message-id: <20260226-cmdq-locking-v2-0-c7e16a6d5885@nvidia.com>
prerequisite-patch-id: fefd403caf8af386276351dd12397dda8ae8553f
prerequisite-patch-id: 1fb4b67abba75a81bd5ee5e545a7caae8046a3ea
prerequisite-patch-id: 10c23618b43dc8fea11f7b23d1d9389d04ede08d
prerequisite-patch-id: cf0393b708109d4264745131a511eef7218aa173

Best regards,
-- 
Eliot Courtney <ecourtney@nvidia.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/9] gpu: nova-core: gsp: add NV_STATUS error code bindings
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-02-27 12:32 ` [PATCH 2/9] gpu: nova-core: gsp: add NvStatus enum for RM control errors Eliot Courtney
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add bindgen generated constants for NV_STATUS. This is used for RM
control messages.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 144 ++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 334e8be5fde8..dd37a7fd58c6 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -379,6 +379,150 @@ pub struct NV2080_CTRL_CMD_FB_GET_FB_REGION_INFO_PARAMS {
     pub __bindgen_padding_0: [u8; 4usize],
     pub fbRegion: [NV2080_CTRL_CMD_FB_GET_FB_REGION_FB_REGION_INFO; 16usize],
 }
+pub const NV_OK: _bindgen_ty_4 = 0;
+pub const NV_ERR_GENERIC: _bindgen_ty_4 = 65535;
+pub const NV_ERR_BROKEN_FB: _bindgen_ty_4 = 1;
+pub const NV_ERR_BUFFER_TOO_SMALL: _bindgen_ty_4 = 2;
+pub const NV_ERR_BUSY_RETRY: _bindgen_ty_4 = 3;
+pub const NV_ERR_CALLBACK_NOT_SCHEDULED: _bindgen_ty_4 = 4;
+pub const NV_ERR_CARD_NOT_PRESENT: _bindgen_ty_4 = 5;
+pub const NV_ERR_CYCLE_DETECTED: _bindgen_ty_4 = 6;
+pub const NV_ERR_DMA_IN_USE: _bindgen_ty_4 = 7;
+pub const NV_ERR_DMA_MEM_NOT_LOCKED: _bindgen_ty_4 = 8;
+pub const NV_ERR_DMA_MEM_NOT_UNLOCKED: _bindgen_ty_4 = 9;
+pub const NV_ERR_DUAL_LINK_INUSE: _bindgen_ty_4 = 10;
+pub const NV_ERR_ECC_ERROR: _bindgen_ty_4 = 11;
+pub const NV_ERR_FIFO_BAD_ACCESS: _bindgen_ty_4 = 12;
+pub const NV_ERR_FREQ_NOT_SUPPORTED: _bindgen_ty_4 = 13;
+pub const NV_ERR_GPU_DMA_NOT_INITIALIZED: _bindgen_ty_4 = 14;
+pub const NV_ERR_GPU_IS_LOST: _bindgen_ty_4 = 15;
+pub const NV_ERR_GPU_IN_FULLCHIP_RESET: _bindgen_ty_4 = 16;
+pub const NV_ERR_GPU_NOT_FULL_POWER: _bindgen_ty_4 = 17;
+pub const NV_ERR_GPU_UUID_NOT_FOUND: _bindgen_ty_4 = 18;
+pub const NV_ERR_HOT_SWITCH: _bindgen_ty_4 = 19;
+pub const NV_ERR_I2C_ERROR: _bindgen_ty_4 = 20;
+pub const NV_ERR_I2C_SPEED_TOO_HIGH: _bindgen_ty_4 = 21;
+pub const NV_ERR_ILLEGAL_ACTION: _bindgen_ty_4 = 22;
+pub const NV_ERR_IN_USE: _bindgen_ty_4 = 23;
+pub const NV_ERR_INFLATE_COMPRESSED_DATA_FAILED: _bindgen_ty_4 = 24;
+pub const NV_ERR_INSERT_DUPLICATE_NAME: _bindgen_ty_4 = 25;
+pub const NV_ERR_INSUFFICIENT_RESOURCES: _bindgen_ty_4 = 26;
+pub const NV_ERR_INSUFFICIENT_PERMISSIONS: _bindgen_ty_4 = 27;
+pub const NV_ERR_INSUFFICIENT_POWER: _bindgen_ty_4 = 28;
+pub const NV_ERR_INVALID_ACCESS_TYPE: _bindgen_ty_4 = 29;
+pub const NV_ERR_INVALID_ADDRESS: _bindgen_ty_4 = 30;
+pub const NV_ERR_INVALID_ARGUMENT: _bindgen_ty_4 = 31;
+pub const NV_ERR_INVALID_BASE: _bindgen_ty_4 = 32;
+pub const NV_ERR_INVALID_CHANNEL: _bindgen_ty_4 = 33;
+pub const NV_ERR_INVALID_CLASS: _bindgen_ty_4 = 34;
+pub const NV_ERR_INVALID_CLIENT: _bindgen_ty_4 = 35;
+pub const NV_ERR_INVALID_COMMAND: _bindgen_ty_4 = 36;
+pub const NV_ERR_INVALID_DATA: _bindgen_ty_4 = 37;
+pub const NV_ERR_INVALID_DEVICE: _bindgen_ty_4 = 38;
+pub const NV_ERR_INVALID_DMA_SPECIFIER: _bindgen_ty_4 = 39;
+pub const NV_ERR_INVALID_EVENT: _bindgen_ty_4 = 40;
+pub const NV_ERR_INVALID_FLAGS: _bindgen_ty_4 = 41;
+pub const NV_ERR_INVALID_FUNCTION: _bindgen_ty_4 = 42;
+pub const NV_ERR_INVALID_HEAP: _bindgen_ty_4 = 43;
+pub const NV_ERR_INVALID_INDEX: _bindgen_ty_4 = 44;
+pub const NV_ERR_INVALID_IRQ_LEVEL: _bindgen_ty_4 = 45;
+pub const NV_ERR_INVALID_LIMIT: _bindgen_ty_4 = 46;
+pub const NV_ERR_INVALID_LOCK_STATE: _bindgen_ty_4 = 47;
+pub const NV_ERR_INVALID_METHOD: _bindgen_ty_4 = 48;
+pub const NV_ERR_INVALID_OBJECT: _bindgen_ty_4 = 49;
+pub const NV_ERR_INVALID_OBJECT_BUFFER: _bindgen_ty_4 = 50;
+pub const NV_ERR_INVALID_OBJECT_HANDLE: _bindgen_ty_4 = 51;
+pub const NV_ERR_INVALID_OBJECT_NEW: _bindgen_ty_4 = 52;
+pub const NV_ERR_INVALID_OBJECT_OLD: _bindgen_ty_4 = 53;
+pub const NV_ERR_INVALID_OBJECT_PARENT: _bindgen_ty_4 = 54;
+pub const NV_ERR_INVALID_OFFSET: _bindgen_ty_4 = 55;
+pub const NV_ERR_INVALID_OPERATION: _bindgen_ty_4 = 56;
+pub const NV_ERR_INVALID_OWNER: _bindgen_ty_4 = 57;
+pub const NV_ERR_INVALID_PARAM_STRUCT: _bindgen_ty_4 = 58;
+pub const NV_ERR_INVALID_PARAMETER: _bindgen_ty_4 = 59;
+pub const NV_ERR_INVALID_PATH: _bindgen_ty_4 = 60;
+pub const NV_ERR_INVALID_POINTER: _bindgen_ty_4 = 61;
+pub const NV_ERR_INVALID_REGISTRY_KEY: _bindgen_ty_4 = 62;
+pub const NV_ERR_INVALID_REQUEST: _bindgen_ty_4 = 63;
+pub const NV_ERR_INVALID_STATE: _bindgen_ty_4 = 64;
+pub const NV_ERR_INVALID_STRING_LENGTH: _bindgen_ty_4 = 65;
+pub const NV_ERR_INVALID_READ: _bindgen_ty_4 = 66;
+pub const NV_ERR_INVALID_WRITE: _bindgen_ty_4 = 67;
+pub const NV_ERR_INVALID_XLATE: _bindgen_ty_4 = 68;
+pub const NV_ERR_IRQ_NOT_FIRING: _bindgen_ty_4 = 69;
+pub const NV_ERR_IRQ_EDGE_TRIGGERED: _bindgen_ty_4 = 70;
+pub const NV_ERR_MEMORY_TRAINING_FAILED: _bindgen_ty_4 = 71;
+pub const NV_ERR_MISMATCHED_SLAVE: _bindgen_ty_4 = 72;
+pub const NV_ERR_MISMATCHED_TARGET: _bindgen_ty_4 = 73;
+pub const NV_ERR_MISSING_TABLE_ENTRY: _bindgen_ty_4 = 74;
+pub const NV_ERR_MODULE_LOAD_FAILED: _bindgen_ty_4 = 75;
+pub const NV_ERR_MORE_DATA_AVAILABLE: _bindgen_ty_4 = 76;
+pub const NV_ERR_MORE_PROCESSING_REQUIRED: _bindgen_ty_4 = 77;
+pub const NV_ERR_MULTIPLE_MEMORY_TYPES: _bindgen_ty_4 = 78;
+pub const NV_ERR_NO_FREE_FIFOS: _bindgen_ty_4 = 79;
+pub const NV_ERR_NO_INTR_PENDING: _bindgen_ty_4 = 80;
+pub const NV_ERR_NO_MEMORY: _bindgen_ty_4 = 81;
+pub const NV_ERR_NO_SUCH_DOMAIN: _bindgen_ty_4 = 82;
+pub const NV_ERR_NO_VALID_PATH: _bindgen_ty_4 = 83;
+pub const NV_ERR_NOT_COMPATIBLE: _bindgen_ty_4 = 84;
+pub const NV_ERR_NOT_READY: _bindgen_ty_4 = 85;
+pub const NV_ERR_NOT_SUPPORTED: _bindgen_ty_4 = 86;
+pub const NV_ERR_OBJECT_NOT_FOUND: _bindgen_ty_4 = 87;
+pub const NV_ERR_OBJECT_TYPE_MISMATCH: _bindgen_ty_4 = 88;
+pub const NV_ERR_OPERATING_SYSTEM: _bindgen_ty_4 = 89;
+pub const NV_ERR_OTHER_DEVICE_FOUND: _bindgen_ty_4 = 90;
+pub const NV_ERR_OUT_OF_RANGE: _bindgen_ty_4 = 91;
+pub const NV_ERR_OVERLAPPING_UVM_COMMIT: _bindgen_ty_4 = 92;
+pub const NV_ERR_PAGE_TABLE_NOT_AVAIL: _bindgen_ty_4 = 93;
+pub const NV_ERR_PID_NOT_FOUND: _bindgen_ty_4 = 94;
+pub const NV_ERR_PROTECTION_FAULT: _bindgen_ty_4 = 95;
+pub const NV_ERR_RC_ERROR: _bindgen_ty_4 = 96;
+pub const NV_ERR_REJECTED_VBIOS: _bindgen_ty_4 = 97;
+pub const NV_ERR_RESET_REQUIRED: _bindgen_ty_4 = 98;
+pub const NV_ERR_STATE_IN_USE: _bindgen_ty_4 = 99;
+pub const NV_ERR_SIGNAL_PENDING: _bindgen_ty_4 = 100;
+pub const NV_ERR_TIMEOUT: _bindgen_ty_4 = 101;
+pub const NV_ERR_TIMEOUT_RETRY: _bindgen_ty_4 = 102;
+pub const NV_ERR_TOO_MANY_PRIMARIES: _bindgen_ty_4 = 103;
+pub const NV_ERR_UVM_ADDRESS_IN_USE: _bindgen_ty_4 = 104;
+pub const NV_ERR_MAX_SESSION_LIMIT_REACHED: _bindgen_ty_4 = 105;
+pub const NV_ERR_LIB_RM_VERSION_MISMATCH: _bindgen_ty_4 = 106;
+pub const NV_ERR_PRIV_SEC_VIOLATION: _bindgen_ty_4 = 107;
+pub const NV_ERR_GPU_IN_DEBUG_MODE: _bindgen_ty_4 = 108;
+pub const NV_ERR_FEATURE_NOT_ENABLED: _bindgen_ty_4 = 109;
+pub const NV_ERR_RESOURCE_LOST: _bindgen_ty_4 = 110;
+pub const NV_ERR_PMU_NOT_READY: _bindgen_ty_4 = 111;
+pub const NV_ERR_FLCN_ERROR: _bindgen_ty_4 = 112;
+pub const NV_ERR_FATAL_ERROR: _bindgen_ty_4 = 113;
+pub const NV_ERR_MEMORY_ERROR: _bindgen_ty_4 = 114;
+pub const NV_ERR_INVALID_LICENSE: _bindgen_ty_4 = 115;
+pub const NV_ERR_NVLINK_INIT_ERROR: _bindgen_ty_4 = 116;
+pub const NV_ERR_NVLINK_MINION_ERROR: _bindgen_ty_4 = 117;
+pub const NV_ERR_NVLINK_CLOCK_ERROR: _bindgen_ty_4 = 118;
+pub const NV_ERR_NVLINK_TRAINING_ERROR: _bindgen_ty_4 = 119;
+pub const NV_ERR_NVLINK_CONFIGURATION_ERROR: _bindgen_ty_4 = 120;
+pub const NV_ERR_RISCV_ERROR: _bindgen_ty_4 = 121;
+pub const NV_ERR_FABRIC_MANAGER_NOT_PRESENT: _bindgen_ty_4 = 122;
+pub const NV_ERR_ALREADY_SIGNALLED: _bindgen_ty_4 = 123;
+pub const NV_ERR_QUEUE_TASK_SLOT_NOT_AVAILABLE: _bindgen_ty_4 = 124;
+pub const NV_ERR_KEY_ROTATION_IN_PROGRESS: _bindgen_ty_4 = 125;
+pub const NV_ERR_TEST_ONLY_CODE_NOT_ENABLED: _bindgen_ty_4 = 126;
+pub const NV_ERR_SECURE_BOOT_FAILED: _bindgen_ty_4 = 127;
+pub const NV_ERR_INSUFFICIENT_ZBC_ENTRY: _bindgen_ty_4 = 128;
+pub const NV_ERR_NVLINK_FABRIC_NOT_READY: _bindgen_ty_4 = 129;
+pub const NV_ERR_NVLINK_FABRIC_FAILURE: _bindgen_ty_4 = 130;
+pub const NV_ERR_GPU_MEMORY_ONLINING_FAILURE: _bindgen_ty_4 = 131;
+pub const NV_ERR_REDUCTION_MANAGER_NOT_AVAILABLE: _bindgen_ty_4 = 132;
+pub const NV_ERR_RESOURCE_RETIREMENT_ERROR: _bindgen_ty_4 = 134;
+pub const NV_WARN_HOT_SWITCH: _bindgen_ty_4 = 65537;
+pub const NV_WARN_INCORRECT_PERFMON_DATA: _bindgen_ty_4 = 65538;
+pub const NV_WARN_MISMATCHED_SLAVE: _bindgen_ty_4 = 65539;
+pub const NV_WARN_MISMATCHED_TARGET: _bindgen_ty_4 = 65540;
+pub const NV_WARN_MORE_PROCESSING_REQUIRED: _bindgen_ty_4 = 65541;
+pub const NV_WARN_NOTHING_TO_DO: _bindgen_ty_4 = 65542;
+pub const NV_WARN_NULL_OBJECT: _bindgen_ty_4 = 65543;
+pub const NV_WARN_OUT_OF_RANGE: _bindgen_ty_4 = 65544;
+pub type _bindgen_ty_4 = ffi::c_uint;
 #[repr(C)]
 #[derive(Debug, Copy, Clone, MaybeZeroable)]
 pub struct NV2080_CTRL_GPU_GET_GID_INFO_PARAMS {

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/9] gpu: nova-core: gsp: add NvStatus enum for RM control errors
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
  2026-02-27 12:32 ` [PATCH 1/9] gpu: nova-core: gsp: add NV_STATUS error code bindings Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-02-27 12:32 ` [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles Eliot Courtney
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add NvStatus enum that wraps the raw NV_STATUS u32 codes returned by RM
control RPCs.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw.rs | 401 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 401 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 25fca1f6db2c..c71c45462efd 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -43,6 +43,407 @@
 pub(crate) const GSP_MSG_QUEUE_ELEMENT_SIZE_MAX: usize =
     num::u32_as_usize(bindings::GSP_MSG_QUEUE_ELEMENT_SIZE_MAX);
 
+/// Status code returned by GSP-RM RPCs.
+#[derive(Copy, Clone, Debug, PartialEq, Eq)]
+pub(crate) enum NvStatus {
+    Ok,
+    AlreadySignalled,
+    BrokenFb,
+    BufferTooSmall,
+    BusyRetry,
+    CallbackNotScheduled,
+    CardNotPresent,
+    CycleDetected,
+    DmaInUse,
+    DmaMemNotLocked,
+    DmaMemNotUnlocked,
+    DualLinkInuse,
+    EccError,
+    FabricManagerNotPresent,
+    FatalError,
+    FeatureNotEnabled,
+    FifoBadAccess,
+    FlcnError,
+    FreqNotSupported,
+    Generic,
+    GpuDmaNotInitialized,
+    GpuInDebugMode,
+    GpuInFullchipReset,
+    GpuIsLost,
+    GpuMemoryOnliningFailure,
+    GpuNotFullPower,
+    GpuUuidNotFound,
+    HotSwitch,
+    I2cError,
+    I2cSpeedTooHigh,
+    IllegalAction,
+    InUse,
+    InflateCompressedDataFailed,
+    InsertDuplicateName,
+    InsufficientPermissions,
+    InsufficientPower,
+    InsufficientResources,
+    InsufficientZbcEntry,
+    InvalidAccessType,
+    InvalidAddress,
+    InvalidArgument,
+    InvalidBase,
+    InvalidChannel,
+    InvalidClass,
+    InvalidClient,
+    InvalidCommand,
+    InvalidData,
+    InvalidDevice,
+    InvalidDmaSpecifier,
+    InvalidEvent,
+    InvalidFlags,
+    InvalidFunction,
+    InvalidHeap,
+    InvalidIndex,
+    InvalidIrqLevel,
+    InvalidLicense,
+    InvalidLimit,
+    InvalidLockState,
+    InvalidMethod,
+    InvalidObject,
+    InvalidObjectBuffer,
+    InvalidObjectHandle,
+    InvalidObjectNew,
+    InvalidObjectOld,
+    InvalidObjectParent,
+    InvalidOffset,
+    InvalidOperation,
+    InvalidOwner,
+    InvalidParamStruct,
+    InvalidParameter,
+    InvalidPath,
+    InvalidPointer,
+    InvalidRead,
+    InvalidRegistryKey,
+    InvalidRequest,
+    InvalidState,
+    InvalidStringLength,
+    InvalidWrite,
+    InvalidXlate,
+    IrqEdgeTriggered,
+    IrqNotFiring,
+    KeyRotationInProgress,
+    LibRmVersionMismatch,
+    MaxSessionLimitReached,
+    MemoryError,
+    MemoryTrainingFailed,
+    MismatchedSlave,
+    MismatchedTarget,
+    MissingTableEntry,
+    ModuleLoadFailed,
+    MoreDataAvailable,
+    MoreProcessingRequired,
+    MultipleMemoryTypes,
+    NoFreeFifos,
+    NoIntrPending,
+    NoMemory,
+    NoSuchDomain,
+    NoValidPath,
+    NotCompatible,
+    NotReady,
+    NotSupported,
+    NvlinkClockError,
+    NvlinkConfigurationError,
+    NvlinkFabricFailure,
+    NvlinkFabricNotReady,
+    NvlinkInitError,
+    NvlinkMinionError,
+    NvlinkTrainingError,
+    ObjectNotFound,
+    ObjectTypeMismatch,
+    OperatingSystem,
+    OtherDeviceFound,
+    OutOfRange,
+    OverlappingUvmCommit,
+    PageTableNotAvail,
+    PidNotFound,
+    PmuNotReady,
+    PrivSecViolation,
+    ProtectionFault,
+    QueueTaskSlotNotAvailable,
+    RcError,
+    ReductionManagerNotAvailable,
+    RejectedVbios,
+    ResetRequired,
+    ResourceLost,
+    ResourceRetirementError,
+    RiscvError,
+    SecureBootFailed,
+    SignalPending,
+    StateInUse,
+    TestOnlyCodeNotEnabled,
+    Timeout,
+    TimeoutRetry,
+    TooManyPrimaries,
+    UvmAddressInUse,
+    Unknown(u32),
+}
+
+impl From<NvStatus> for Result {
+    fn from(status: NvStatus) -> Self {
+        match status {
+            NvStatus::Ok => Ok(()),
+
+            NvStatus::BufferTooSmall | NvStatus::MoreDataAvailable => Err(EMSGSIZE),
+
+            NvStatus::BusyRetry
+            | NvStatus::DmaInUse
+            | NvStatus::DualLinkInuse
+            | NvStatus::GpuInFullchipReset
+            | NvStatus::InUse
+            | NvStatus::KeyRotationInProgress
+            | NvStatus::NotReady
+            | NvStatus::NvlinkFabricNotReady
+            | NvStatus::PmuNotReady
+            | NvStatus::StateInUse
+            | NvStatus::UvmAddressInUse => Err(EBUSY),
+
+            NvStatus::CardNotPresent
+            | NvStatus::FabricManagerNotPresent
+            | NvStatus::InvalidDevice
+            | NvStatus::ReductionManagerNotAvailable => Err(ENODEV),
+
+            NvStatus::FeatureNotEnabled
+            | NvStatus::FreqNotSupported
+            | NvStatus::NotSupported
+            | NvStatus::TestOnlyCodeNotEnabled => Err(ENOTSUPP),
+
+            NvStatus::GpuUuidNotFound
+            | NvStatus::MissingTableEntry
+            | NvStatus::NoSuchDomain
+            | NvStatus::NoValidPath
+            | NvStatus::ObjectNotFound => Err(ENOENT),
+
+            NvStatus::I2cSpeedTooHigh
+            | NvStatus::InvalidAccessType
+            | NvStatus::InvalidArgument
+            | NvStatus::InvalidBase
+            | NvStatus::InvalidChannel
+            | NvStatus::InvalidClass
+            | NvStatus::InvalidClient
+            | NvStatus::InvalidCommand
+            | NvStatus::InvalidData
+            | NvStatus::InvalidDmaSpecifier
+            | NvStatus::InvalidEvent
+            | NvStatus::InvalidFlags
+            | NvStatus::InvalidFunction
+            | NvStatus::InvalidHeap
+            | NvStatus::InvalidIndex
+            | NvStatus::InvalidIrqLevel
+            | NvStatus::InvalidLimit
+            | NvStatus::InvalidLockState
+            | NvStatus::InvalidMethod
+            | NvStatus::InvalidObject
+            | NvStatus::InvalidObjectBuffer
+            | NvStatus::InvalidObjectHandle
+            | NvStatus::InvalidObjectNew
+            | NvStatus::InvalidObjectOld
+            | NvStatus::InvalidObjectParent
+            | NvStatus::InvalidOffset
+            | NvStatus::InvalidOperation
+            | NvStatus::InvalidOwner
+            | NvStatus::InvalidParamStruct
+            | NvStatus::InvalidParameter
+            | NvStatus::InvalidPath
+            | NvStatus::InvalidRegistryKey
+            | NvStatus::InvalidRequest
+            | NvStatus::InvalidState
+            | NvStatus::InvalidStringLength
+            | NvStatus::InvalidXlate
+            | NvStatus::LibRmVersionMismatch
+            | NvStatus::MismatchedSlave
+            | NvStatus::MismatchedTarget
+            | NvStatus::MultipleMemoryTypes
+            | NvStatus::NotCompatible
+            | NvStatus::ObjectTypeMismatch
+            | NvStatus::OverlappingUvmCommit
+            | NvStatus::RejectedVbios => Err(EINVAL),
+
+            NvStatus::IllegalAction => Err(EPERM),
+
+            NvStatus::InsertDuplicateName => Err(EEXIST),
+
+            NvStatus::InsufficientPermissions
+            | NvStatus::InvalidLicense
+            | NvStatus::PrivSecViolation => Err(EACCES),
+
+            NvStatus::InsufficientResources | NvStatus::NoMemory | NvStatus::PageTableNotAvail => {
+                Err(ENOMEM)
+            }
+
+            NvStatus::InsufficientZbcEntry
+            | NvStatus::MaxSessionLimitReached
+            | NvStatus::NoFreeFifos
+            | NvStatus::QueueTaskSlotNotAvailable
+            | NvStatus::TooManyPrimaries => Err(ENOSPC),
+
+            NvStatus::InvalidAddress | NvStatus::InvalidPointer | NvStatus::ProtectionFault => {
+                Err(EFAULT)
+            }
+
+            NvStatus::MoreProcessingRequired | NvStatus::TimeoutRetry => Err(EAGAIN),
+
+            NvStatus::OutOfRange => Err(ERANGE),
+
+            NvStatus::PidNotFound => Err(ESRCH),
+
+            NvStatus::SignalPending => Err(EINTR),
+
+            NvStatus::Timeout => Err(ETIMEDOUT),
+
+            _ => Err(EIO),
+        }
+    }
+}
+
+impl From<u32> for NvStatus {
+    fn from(value: u32) -> Self {
+        match value {
+            bindings::NV_OK => Self::Ok,
+            bindings::NV_ERR_ALREADY_SIGNALLED => Self::AlreadySignalled,
+            bindings::NV_ERR_BROKEN_FB => Self::BrokenFb,
+            bindings::NV_ERR_BUFFER_TOO_SMALL => Self::BufferTooSmall,
+            bindings::NV_ERR_BUSY_RETRY => Self::BusyRetry,
+            bindings::NV_ERR_CALLBACK_NOT_SCHEDULED => Self::CallbackNotScheduled,
+            bindings::NV_ERR_CARD_NOT_PRESENT => Self::CardNotPresent,
+            bindings::NV_ERR_CYCLE_DETECTED => Self::CycleDetected,
+            bindings::NV_ERR_DMA_IN_USE => Self::DmaInUse,
+            bindings::NV_ERR_DMA_MEM_NOT_LOCKED => Self::DmaMemNotLocked,
+            bindings::NV_ERR_DMA_MEM_NOT_UNLOCKED => Self::DmaMemNotUnlocked,
+            bindings::NV_ERR_DUAL_LINK_INUSE => Self::DualLinkInuse,
+            bindings::NV_ERR_ECC_ERROR => Self::EccError,
+            bindings::NV_ERR_FABRIC_MANAGER_NOT_PRESENT => Self::FabricManagerNotPresent,
+            bindings::NV_ERR_FATAL_ERROR => Self::FatalError,
+            bindings::NV_ERR_FEATURE_NOT_ENABLED => Self::FeatureNotEnabled,
+            bindings::NV_ERR_FIFO_BAD_ACCESS => Self::FifoBadAccess,
+            bindings::NV_ERR_FLCN_ERROR => Self::FlcnError,
+            bindings::NV_ERR_FREQ_NOT_SUPPORTED => Self::FreqNotSupported,
+            bindings::NV_ERR_GENERIC => Self::Generic,
+            bindings::NV_ERR_GPU_DMA_NOT_INITIALIZED => Self::GpuDmaNotInitialized,
+            bindings::NV_ERR_GPU_IN_DEBUG_MODE => Self::GpuInDebugMode,
+            bindings::NV_ERR_GPU_IN_FULLCHIP_RESET => Self::GpuInFullchipReset,
+            bindings::NV_ERR_GPU_IS_LOST => Self::GpuIsLost,
+            bindings::NV_ERR_GPU_MEMORY_ONLINING_FAILURE => Self::GpuMemoryOnliningFailure,
+            bindings::NV_ERR_GPU_NOT_FULL_POWER => Self::GpuNotFullPower,
+            bindings::NV_ERR_GPU_UUID_NOT_FOUND => Self::GpuUuidNotFound,
+            bindings::NV_ERR_HOT_SWITCH => Self::HotSwitch,
+            bindings::NV_ERR_I2C_ERROR => Self::I2cError,
+            bindings::NV_ERR_I2C_SPEED_TOO_HIGH => Self::I2cSpeedTooHigh,
+            bindings::NV_ERR_ILLEGAL_ACTION => Self::IllegalAction,
+            bindings::NV_ERR_IN_USE => Self::InUse,
+            bindings::NV_ERR_INFLATE_COMPRESSED_DATA_FAILED => Self::InflateCompressedDataFailed,
+            bindings::NV_ERR_INSERT_DUPLICATE_NAME => Self::InsertDuplicateName,
+            bindings::NV_ERR_INSUFFICIENT_PERMISSIONS => Self::InsufficientPermissions,
+            bindings::NV_ERR_INSUFFICIENT_POWER => Self::InsufficientPower,
+            bindings::NV_ERR_INSUFFICIENT_RESOURCES => Self::InsufficientResources,
+            bindings::NV_ERR_INSUFFICIENT_ZBC_ENTRY => Self::InsufficientZbcEntry,
+            bindings::NV_ERR_INVALID_ACCESS_TYPE => Self::InvalidAccessType,
+            bindings::NV_ERR_INVALID_ADDRESS => Self::InvalidAddress,
+            bindings::NV_ERR_INVALID_ARGUMENT => Self::InvalidArgument,
+            bindings::NV_ERR_INVALID_BASE => Self::InvalidBase,
+            bindings::NV_ERR_INVALID_CHANNEL => Self::InvalidChannel,
+            bindings::NV_ERR_INVALID_CLASS => Self::InvalidClass,
+            bindings::NV_ERR_INVALID_CLIENT => Self::InvalidClient,
+            bindings::NV_ERR_INVALID_COMMAND => Self::InvalidCommand,
+            bindings::NV_ERR_INVALID_DATA => Self::InvalidData,
+            bindings::NV_ERR_INVALID_DEVICE => Self::InvalidDevice,
+            bindings::NV_ERR_INVALID_DMA_SPECIFIER => Self::InvalidDmaSpecifier,
+            bindings::NV_ERR_INVALID_EVENT => Self::InvalidEvent,
+            bindings::NV_ERR_INVALID_FLAGS => Self::InvalidFlags,
+            bindings::NV_ERR_INVALID_FUNCTION => Self::InvalidFunction,
+            bindings::NV_ERR_INVALID_HEAP => Self::InvalidHeap,
+            bindings::NV_ERR_INVALID_INDEX => Self::InvalidIndex,
+            bindings::NV_ERR_INVALID_IRQ_LEVEL => Self::InvalidIrqLevel,
+            bindings::NV_ERR_INVALID_LICENSE => Self::InvalidLicense,
+            bindings::NV_ERR_INVALID_LIMIT => Self::InvalidLimit,
+            bindings::NV_ERR_INVALID_LOCK_STATE => Self::InvalidLockState,
+            bindings::NV_ERR_INVALID_METHOD => Self::InvalidMethod,
+            bindings::NV_ERR_INVALID_OBJECT => Self::InvalidObject,
+            bindings::NV_ERR_INVALID_OBJECT_BUFFER => Self::InvalidObjectBuffer,
+            bindings::NV_ERR_INVALID_OBJECT_HANDLE => Self::InvalidObjectHandle,
+            bindings::NV_ERR_INVALID_OBJECT_NEW => Self::InvalidObjectNew,
+            bindings::NV_ERR_INVALID_OBJECT_OLD => Self::InvalidObjectOld,
+            bindings::NV_ERR_INVALID_OBJECT_PARENT => Self::InvalidObjectParent,
+            bindings::NV_ERR_INVALID_OFFSET => Self::InvalidOffset,
+            bindings::NV_ERR_INVALID_OPERATION => Self::InvalidOperation,
+            bindings::NV_ERR_INVALID_OWNER => Self::InvalidOwner,
+            bindings::NV_ERR_INVALID_PARAM_STRUCT => Self::InvalidParamStruct,
+            bindings::NV_ERR_INVALID_PARAMETER => Self::InvalidParameter,
+            bindings::NV_ERR_INVALID_PATH => Self::InvalidPath,
+            bindings::NV_ERR_INVALID_POINTER => Self::InvalidPointer,
+            bindings::NV_ERR_INVALID_READ => Self::InvalidRead,
+            bindings::NV_ERR_INVALID_REGISTRY_KEY => Self::InvalidRegistryKey,
+            bindings::NV_ERR_INVALID_REQUEST => Self::InvalidRequest,
+            bindings::NV_ERR_INVALID_STATE => Self::InvalidState,
+            bindings::NV_ERR_INVALID_STRING_LENGTH => Self::InvalidStringLength,
+            bindings::NV_ERR_INVALID_WRITE => Self::InvalidWrite,
+            bindings::NV_ERR_INVALID_XLATE => Self::InvalidXlate,
+            bindings::NV_ERR_IRQ_EDGE_TRIGGERED => Self::IrqEdgeTriggered,
+            bindings::NV_ERR_IRQ_NOT_FIRING => Self::IrqNotFiring,
+            bindings::NV_ERR_KEY_ROTATION_IN_PROGRESS => Self::KeyRotationInProgress,
+            bindings::NV_ERR_LIB_RM_VERSION_MISMATCH => Self::LibRmVersionMismatch,
+            bindings::NV_ERR_MAX_SESSION_LIMIT_REACHED => Self::MaxSessionLimitReached,
+            bindings::NV_ERR_MEMORY_ERROR => Self::MemoryError,
+            bindings::NV_ERR_MEMORY_TRAINING_FAILED => Self::MemoryTrainingFailed,
+            bindings::NV_ERR_MISMATCHED_SLAVE => Self::MismatchedSlave,
+            bindings::NV_ERR_MISMATCHED_TARGET => Self::MismatchedTarget,
+            bindings::NV_ERR_MISSING_TABLE_ENTRY => Self::MissingTableEntry,
+            bindings::NV_ERR_MODULE_LOAD_FAILED => Self::ModuleLoadFailed,
+            bindings::NV_ERR_MORE_DATA_AVAILABLE => Self::MoreDataAvailable,
+            bindings::NV_ERR_MORE_PROCESSING_REQUIRED => Self::MoreProcessingRequired,
+            bindings::NV_ERR_MULTIPLE_MEMORY_TYPES => Self::MultipleMemoryTypes,
+            bindings::NV_ERR_NO_FREE_FIFOS => Self::NoFreeFifos,
+            bindings::NV_ERR_NO_INTR_PENDING => Self::NoIntrPending,
+            bindings::NV_ERR_NO_MEMORY => Self::NoMemory,
+            bindings::NV_ERR_NO_SUCH_DOMAIN => Self::NoSuchDomain,
+            bindings::NV_ERR_NO_VALID_PATH => Self::NoValidPath,
+            bindings::NV_ERR_NOT_COMPATIBLE => Self::NotCompatible,
+            bindings::NV_ERR_NOT_READY => Self::NotReady,
+            bindings::NV_ERR_NOT_SUPPORTED => Self::NotSupported,
+            bindings::NV_ERR_NVLINK_CLOCK_ERROR => Self::NvlinkClockError,
+            bindings::NV_ERR_NVLINK_CONFIGURATION_ERROR => Self::NvlinkConfigurationError,
+            bindings::NV_ERR_NVLINK_FABRIC_FAILURE => Self::NvlinkFabricFailure,
+            bindings::NV_ERR_NVLINK_FABRIC_NOT_READY => Self::NvlinkFabricNotReady,
+            bindings::NV_ERR_NVLINK_INIT_ERROR => Self::NvlinkInitError,
+            bindings::NV_ERR_NVLINK_MINION_ERROR => Self::NvlinkMinionError,
+            bindings::NV_ERR_NVLINK_TRAINING_ERROR => Self::NvlinkTrainingError,
+            bindings::NV_ERR_OBJECT_NOT_FOUND => Self::ObjectNotFound,
+            bindings::NV_ERR_OBJECT_TYPE_MISMATCH => Self::ObjectTypeMismatch,
+            bindings::NV_ERR_OPERATING_SYSTEM => Self::OperatingSystem,
+            bindings::NV_ERR_OTHER_DEVICE_FOUND => Self::OtherDeviceFound,
+            bindings::NV_ERR_OUT_OF_RANGE => Self::OutOfRange,
+            bindings::NV_ERR_OVERLAPPING_UVM_COMMIT => Self::OverlappingUvmCommit,
+            bindings::NV_ERR_PAGE_TABLE_NOT_AVAIL => Self::PageTableNotAvail,
+            bindings::NV_ERR_PID_NOT_FOUND => Self::PidNotFound,
+            bindings::NV_ERR_PMU_NOT_READY => Self::PmuNotReady,
+            bindings::NV_ERR_PRIV_SEC_VIOLATION => Self::PrivSecViolation,
+            bindings::NV_ERR_PROTECTION_FAULT => Self::ProtectionFault,
+            bindings::NV_ERR_QUEUE_TASK_SLOT_NOT_AVAILABLE => Self::QueueTaskSlotNotAvailable,
+            bindings::NV_ERR_RC_ERROR => Self::RcError,
+            bindings::NV_ERR_REDUCTION_MANAGER_NOT_AVAILABLE => Self::ReductionManagerNotAvailable,
+            bindings::NV_ERR_REJECTED_VBIOS => Self::RejectedVbios,
+            bindings::NV_ERR_RESET_REQUIRED => Self::ResetRequired,
+            bindings::NV_ERR_RESOURCE_LOST => Self::ResourceLost,
+            bindings::NV_ERR_RESOURCE_RETIREMENT_ERROR => Self::ResourceRetirementError,
+            bindings::NV_ERR_RISCV_ERROR => Self::RiscvError,
+            bindings::NV_ERR_SECURE_BOOT_FAILED => Self::SecureBootFailed,
+            bindings::NV_ERR_SIGNAL_PENDING => Self::SignalPending,
+            bindings::NV_ERR_STATE_IN_USE => Self::StateInUse,
+            bindings::NV_ERR_TEST_ONLY_CODE_NOT_ENABLED => Self::TestOnlyCodeNotEnabled,
+            bindings::NV_ERR_TIMEOUT => Self::Timeout,
+            bindings::NV_ERR_TIMEOUT_RETRY => Self::TimeoutRetry,
+            bindings::NV_ERR_TOO_MANY_PRIMARIES => Self::TooManyPrimaries,
+            bindings::NV_ERR_UVM_ADDRESS_IN_USE => Self::UvmAddressInUse,
+            other => Self::Unknown(other),
+        }
+    }
+}
+
 /// Empty type to group methods related to heap parameters for running the GSP firmware.
 enum GspFwHeapParams {}
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
  2026-02-27 12:32 ` [PATCH 1/9] gpu: nova-core: gsp: add NV_STATUS error code bindings Eliot Courtney
  2026-02-27 12:32 ` [PATCH 2/9] gpu: nova-core: gsp: add NvStatus enum for RM control errors Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-03-09 21:22   ` Joel Fernandes
  2026-02-27 12:32 ` [PATCH 4/9] gpu: nova-core: gsp: add RM control RPC structure binding Eliot Courtney
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
needed for RM control calls.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
 drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
 2 files changed, 26 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 4740cda0b51c..2cadfcaf9a8a 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
 /// The reply from the GSP to the [`GetGspInfo`] command.
 pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
+    h_client: u32,
+    h_subdevice: u32,
 }
 
 impl MessageFromGsp for GetGspStaticInfoReply {
@@ -210,6 +212,8 @@ fn read(
     ) -> Result<Self, Self::InitError> {
         Ok(GetGspStaticInfoReply {
             gpu_name: msg.gpu_name_str(),
+            h_client: msg.h_internal_client(),
+            h_subdevice: msg.h_internal_subdevice(),
         })
     }
 }
@@ -236,6 +240,18 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
             .to_str()
             .map_err(GpuNameError::InvalidUtf8)
     }
+
+    /// Returns the internal client handle allocated by GSP-RM.
+    #[expect(dead_code)]
+    pub(crate) fn h_client(&self) -> u32 {
+        self.h_client
+    }
+
+    /// Returns the internal subdevice handle allocated by GSP-RM.
+    #[expect(dead_code)]
+    pub(crate) fn h_subdevice(&self) -> u32 {
+        self.h_subdevice
+    }
 }
 
 /// Send the [`GetGspInfo`] command and awaits for its reply.
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index 67f44421fcc3..aaf3509a0207 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -115,6 +115,16 @@ impl GspStaticConfigInfo {
     pub(crate) fn gpu_name_str(&self) -> [u8; 64] {
         self.0.gpuNameString
     }
+
+    /// Returns the internal client handle allocated by GSP-RM.
+    pub(crate) fn h_internal_client(&self) -> u32 {
+        self.0.hInternalClient
+    }
+
+    /// Returns the internal subdevice handle allocated by GSP-RM.
+    pub(crate) fn h_internal_subdevice(&self) -> u32 {
+        self.0.hInternalSubdevice
+    }
 }
 
 // SAFETY: Padding is explicit and will not contain uninitialized data.

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/9] gpu: nova-core: gsp: add RM control RPC structure binding
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
                   ` (2 preceding siblings ...)
  2026-02-27 12:32 ` [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-02-27 12:32 ` [PATCH 5/9] gpu: nova-core: gsp: add types for RM control RPCs Eliot Courtney
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add the bindgen rpc_gsp_rm_control_v03_00 structure. This is the
structure for sending RM control commands.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index dd37a7fd58c6..05e205e6dc58 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -1025,6 +1025,17 @@ fn default() -> Self {
 }
 #[repr(C)]
 #[derive(Debug, Default, MaybeZeroable)]
+pub struct rpc_gsp_rm_control_v03_00 {
+    pub hClient: u32_,
+    pub hObject: u32_,
+    pub cmd: u32_,
+    pub status: u32_,
+    pub paramsSize: u32_,
+    pub flags: u32_,
+    pub params: __IncompleteArrayField<u8_>,
+}
+#[repr(C)]
+#[derive(Debug, Default, MaybeZeroable)]
 pub struct rpc_run_cpu_sequencer_v17_00 {
     pub bufferSizeDWord: u32_,
     pub cmdIndex: u32_,

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/9] gpu: nova-core: gsp: add types for RM control RPCs
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
                   ` (3 preceding siblings ...)
  2026-02-27 12:32 ` [PATCH 4/9] gpu: nova-core: gsp: add RM control RPC structure binding Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-03-09 21:45   ` Joel Fernandes
  2026-02-27 12:32 ` [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec` Eliot Courtney
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add `RmControlMsgFunction` which mirrors `MsgFunction` in fw.rs. This
denotes the type of RM control RPC. For now it contains a single
discriminant only (which will be used later), which is needed to prevent
compile errors when using an otherwise empty enum.

Add `GspRmControl` which wraps the RM control RPC structure from the
bindings.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw.rs                   |  1 +
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs |  1 +
 drivers/gpu/nova-core/gsp/fw/rm.rs                | 82 +++++++++++++++++++++++
 3 files changed, 84 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index c71c45462efd..0796e6c0baf2 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -2,6 +2,7 @@
 
 pub(crate) mod commands;
 mod r570_144;
+pub(crate) mod rm;
 
 // Alias to avoid repeating the version number with every use.
 use r570_144 as bindings;
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 05e205e6dc58..ece31cc32f5b 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -44,6 +44,7 @@ fn fmt(&self, fmt: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
 pub const GSP_FW_WPR_META_MAGIC: i64 = -2577556379034558285;
 pub const REGISTRY_TABLE_ENTRY_TYPE_DWORD: u32 = 1;
 pub const GSP_MSG_QUEUE_ELEMENT_SIZE_MAX: u32 = 65536;
+pub const NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE: u32 = 545270280;
 pub type __u8 = ffi::c_uchar;
 pub type __u16 = ffi::c_ushort;
 pub type __u32 = ffi::c_uint;
diff --git a/drivers/gpu/nova-core/gsp/fw/rm.rs b/drivers/gpu/nova-core/gsp/fw/rm.rs
new file mode 100644
index 000000000000..8bb7b11736b9
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/fw/rm.rs
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::{
+    prelude::*,
+    transmute::{
+        AsBytes,
+        FromBytes, //
+    }, //
+};
+
+use super::{
+    bindings,
+    NvStatus, //
+};
+
+/// Command code for RM control RPCs sent using [`MsgFunction::GspRmControl`].
+#[derive(Copy, Clone, Debug, PartialEq)]
+#[repr(u32)]
+pub(crate) enum RmControlMsgFunction {
+    /// Get the CE fault method buffer size.
+    CeGetFaultMethodBufferSize = bindings::NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE,
+}
+
+impl TryFrom<u32> for RmControlMsgFunction {
+    type Error = kernel::error::Error;
+
+    fn try_from(value: u32) -> Result<Self> {
+        match value {
+            bindings::NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE => {
+                Ok(Self::CeGetFaultMethodBufferSize)
+            }
+            _ => Err(EINVAL),
+        }
+    }
+}
+
+impl From<RmControlMsgFunction> for u32 {
+    fn from(value: RmControlMsgFunction) -> Self {
+        // CAST: `RmControlMsgFunction` is `repr(u32)` and can thus be cast losslessly.
+        value as u32
+    }
+}
+
+/// RM control message element structure.
+#[derive(Zeroable)]
+#[repr(transparent)]
+pub(crate) struct GspRmControl {
+    inner: bindings::rpc_gsp_rm_control_v03_00,
+}
+
+impl GspRmControl {
+    /// Creates a new RM control command.
+    pub(crate) fn new(
+        h_client: u32,
+        h_object: u32,
+        cmd: RmControlMsgFunction,
+        params_size: u32,
+    ) -> Self {
+        Self {
+            inner: bindings::rpc_gsp_rm_control_v03_00 {
+                hClient: h_client,
+                hObject: h_object,
+                cmd: u32::from(cmd),
+                status: 0,
+                paramsSize: params_size,
+                flags: 0,
+                params: Default::default(),
+            },
+        }
+    }
+
+    /// Returns the status from the RM control response.
+    pub(crate) fn status(&self) -> NvStatus {
+        NvStatus::from(self.inner.status)
+    }
+}
+
+// SAFETY: This struct only contains integer types for which all bit patterns are valid.
+unsafe impl FromBytes for GspRmControl {}
+
+// SAFETY: This struct contains no padding.
+unsafe impl AsBytes for GspRmControl {}

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
                   ` (4 preceding siblings ...)
  2026-02-27 12:32 ` [PATCH 5/9] gpu: nova-core: gsp: add types for RM control RPCs Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-03-09 21:53   ` Joel Fernandes
  2026-03-09 21:57   ` Danilo Krummrich
  2026-02-27 12:32 ` [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add general `flush_into_vec` function. Add `flush_into_kvvec`
convenience wrapper alongside the existing `flush_into_kvec` function.
This is generally useful but immediately used for e.g. holding RM
control payloads, which can be large (~>=20 KiB).

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/sbuffer.rs | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/nova-core/sbuffer.rs b/drivers/gpu/nova-core/sbuffer.rs
index 3a41d224c77a..38f8a8426521 100644
--- a/drivers/gpu/nova-core/sbuffer.rs
+++ b/drivers/gpu/nova-core/sbuffer.rs
@@ -2,7 +2,13 @@
 
 use core::ops::Deref;
 
-use kernel::prelude::*;
+use kernel::{
+    alloc::{
+        Allocator,
+        KVec, //
+    },
+    prelude::*, //
+};
 
 /// A buffer abstraction for discontiguous byte slices.
 ///
@@ -162,11 +168,14 @@ pub(crate) fn read_exact(&mut self, mut dst: &mut [u8]) -> Result {
         Ok(())
     }
 
-    /// Read all the remaining data into a [`KVec`].
+    /// Read all the remaining data into a [`Vec`] with the given allocator.
     ///
     /// `self` will be empty after this operation.
-    pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVec<u8>> {
-        let mut buf = KVec::<u8>::new();
+    pub(crate) fn flush_into_vec<A: Allocator>(
+        &mut self,
+        flags: kernel::alloc::Flags,
+    ) -> Result<Vec<u8, A>> {
+        let mut buf = Vec::<u8, A>::new();
 
         if let Some(slice) = core::mem::take(&mut self.cur_slice) {
             buf.extend_from_slice(slice, flags)?;
@@ -177,6 +186,20 @@ pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<
 
         Ok(buf)
     }
+
+    /// Read all the remaining data into a [`KVec`].
+    ///
+    /// `self` will be empty after this operation.
+    pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVec<u8>> {
+        self.flush_into_vec(flags)
+    }
+
+    /// Read all the remaining data into a [`KVVec`].
+    ///
+    /// `self` will be empty after this operation.
+    pub(crate) fn flush_into_kvvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVVec<u8>> {
+        self.flush_into_vec(flags)
+    }
 }
 
 /// Provides a way to get mutable slices of data to write into.

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
                   ` (5 preceding siblings ...)
  2026-02-27 12:32 ` [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec` Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-03-02  8:00   ` Zhi Wang
                     ` (2 more replies)
  2026-02-27 12:32 ` [PATCH 8/9] gpu: nova-core: gsp: add CE fault method buffer size bindings Eliot Courtney
  2026-02-27 12:32 ` [PATCH 9/9] gpu: nova-core: gsp: add CeGetFaultMethodBufferSize RM control command Eliot Courtney
  8 siblings, 3 replies; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add `RmControl` which implements CommandToGsp for sending RM control
RPCs.

Add `RmControlReply` which implements MessageFromGsp for getting the
reply back.

Add `send_rm_control` which sends an RM control RPC via the command
queue using the above structures.

This gives a generic way to send each RM control RPC. Each new RM
control RPC can be added by extending RmControlMsgFunction and adding
its bindings wrappers and writing a helper function to send it via
`send_rm_control`.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp.rs             |   1 +
 drivers/gpu/nova-core/gsp/rm.rs          |   3 +
 drivers/gpu/nova-core/gsp/rm/commands.rs | 111 +++++++++++++++++++++++++++++++
 3 files changed, 115 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index a6f3918c20b1..1a1c4e9808ac 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -17,6 +17,7 @@
 pub(crate) mod cmdq;
 pub(crate) mod commands;
 mod fw;
+pub(crate) mod rm;
 mod sequencer;
 
 pub(crate) use fw::{
diff --git a/drivers/gpu/nova-core/gsp/rm.rs b/drivers/gpu/nova-core/gsp/rm.rs
new file mode 100644
index 000000000000..10e879a3e842
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/rm.rs
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0
+
+pub(crate) mod commands;
diff --git a/drivers/gpu/nova-core/gsp/rm/commands.rs b/drivers/gpu/nova-core/gsp/rm/commands.rs
new file mode 100644
index 000000000000..16bcf88644db
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/rm/commands.rs
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::{
+    array,
+    convert::Infallible, //
+};
+
+use kernel::prelude::*;
+
+use crate::{
+    driver::Bar0,
+    gsp::{
+        cmdq::{
+            Cmdq,
+            CommandToGsp,
+            MessageFromGsp, //
+        },
+        fw::{
+            rm::*,
+            MsgFunction,
+            NvStatus, //
+        },
+    },
+    sbuffer::SBufferIter,
+};
+
+/// Command for sending an RM control message to the GSP.
+struct RmControl<'a> {
+    h_client: u32,
+    h_object: u32,
+    cmd: RmControlMsgFunction,
+    params: &'a [u8],
+}
+
+impl<'a> RmControl<'a> {
+    /// Creates a new RM control command.
+    fn new(h_client: u32, h_object: u32, cmd: RmControlMsgFunction, params: &'a [u8]) -> Self {
+        Self {
+            h_client,
+            h_object,
+            cmd,
+            params,
+        }
+    }
+}
+
+impl CommandToGsp for RmControl<'_> {
+    const FUNCTION: MsgFunction = MsgFunction::GspRmControl;
+    type Command = GspRmControl;
+    type Reply = RmControlReply;
+    type InitError = Infallible;
+
+    fn init(&self) -> impl Init<Self::Command, Self::InitError> {
+        GspRmControl::new(
+            self.h_client,
+            self.h_object,
+            self.cmd,
+            self.params.len() as u32,
+        )
+    }
+
+    fn variable_payload_len(&self) -> usize {
+        self.params.len()
+    }
+
+    fn init_variable_payload(
+        &self,
+        dst: &mut SBufferIter<array::IntoIter<&mut [u8], 2>>,
+    ) -> Result {
+        dst.write_all(self.params)
+    }
+}
+
+/// Response from an RM control message.
+pub(crate) struct RmControlReply {
+    status: NvStatus,
+    params: KVVec<u8>,
+}
+
+impl MessageFromGsp for RmControlReply {
+    const FUNCTION: MsgFunction = MsgFunction::GspRmControl;
+    type Message = GspRmControl;
+    type InitError = Error;
+
+    fn read(
+        msg: &Self::Message,
+        sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
+    ) -> Result<Self, Self::InitError> {
+        Ok(RmControlReply {
+            status: msg.status(),
+            params: sbuffer.flush_into_kvvec(GFP_KERNEL)?,
+        })
+    }
+}
+
+/// Sends an RM control command, checks the reply status, and returns the raw parameter bytes.
+#[expect(dead_code)]
+fn send_rm_control(
+    cmdq: &Cmdq,
+    bar: &Bar0,
+    h_client: u32,
+    h_object: u32,
+    cmd: RmControlMsgFunction,
+    params: &[u8],
+) -> Result<KVVec<u8>> {
+    let reply = cmdq.send_sync_command(bar, RmControl::new(h_client, h_object, cmd, params))?;
+
+    Result::from(reply.status)?;
+
+    Ok(reply.params)
+}

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 8/9] gpu: nova-core: gsp: add CE fault method buffer size bindings
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
                   ` (6 preceding siblings ...)
  2026-02-27 12:32 ` [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-03-09 22:08   ` Joel Fernandes
  2026-02-27 12:32 ` [PATCH 9/9] gpu: nova-core: gsp: add CeGetFaultMethodBufferSize RM control command Eliot Courtney
  8 siblings, 1 reply; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add the bindings for CE fault method buffer size RM control RPC. This
will be needed for channel allocation and is also a simple way to
demonstrate the usage of the `send_rm_control` infrastructure for now.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index ece31cc32f5b..354ee2cfa295 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -1025,6 +1025,11 @@ fn default() -> Self {
     }
 }
 #[repr(C)]
+#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
+pub struct NV2080_CTRL_CE_GET_FAULT_METHOD_BUFFER_SIZE_PARAMS {
+    pub size: u32_,
+}
+#[repr(C)]
 #[derive(Debug, Default, MaybeZeroable)]
 pub struct rpc_gsp_rm_control_v03_00 {
     pub hClient: u32_,

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 9/9] gpu: nova-core: gsp: add CeGetFaultMethodBufferSize RM control command
  2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
                   ` (7 preceding siblings ...)
  2026-02-27 12:32 ` [PATCH 8/9] gpu: nova-core: gsp: add CE fault method buffer size bindings Eliot Courtney
@ 2026-02-27 12:32 ` Eliot Courtney
  2026-03-09 22:23   ` Joel Fernandes
  8 siblings, 1 reply; 37+ messages in thread
From: Eliot Courtney @ 2026-02-27 12:32 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter
  Cc: rust-for-linux, nouveau, dri-devel, linux-kernel, Eliot Courtney

Add `CeGetFaultMethodBufferSizeParams` which wraps the bindings.

Add `get_ce_fault_method_buffer_size` which sends the RM control RPC
and returns the buffer size. This is needed for channel allocation.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/fw/rm.rs       | 17 +++++++++++++++
 drivers/gpu/nova-core/gsp/rm/commands.rs | 36 +++++++++++++++++++++++++++++---
 2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/fw/rm.rs b/drivers/gpu/nova-core/gsp/fw/rm.rs
index 8bb7b11736b9..9f1e3546d39d 100644
--- a/drivers/gpu/nova-core/gsp/fw/rm.rs
+++ b/drivers/gpu/nova-core/gsp/fw/rm.rs
@@ -80,3 +80,20 @@ unsafe impl FromBytes for GspRmControl {}
 
 // SAFETY: This struct contains no padding.
 unsafe impl AsBytes for GspRmControl {}
+
+/// Wrapper for [`bindings::NV2080_CTRL_CE_GET_FAULT_METHOD_BUFFER_SIZE_PARAMS`].
+#[derive(Zeroable)]
+#[repr(transparent)]
+pub(crate) struct CeGetFaultMethodBufferSizeParams(
+    bindings::NV2080_CTRL_CE_GET_FAULT_METHOD_BUFFER_SIZE_PARAMS,
+);
+
+impl CeGetFaultMethodBufferSizeParams {
+    /// Returns the CE fault method buffer size in bytes.
+    pub(crate) fn size(&self) -> u32 {
+        self.0.size
+    }
+}
+
+// SAFETY: This struct only contains integer types for which all bit patterns are valid.
+unsafe impl FromBytes for CeGetFaultMethodBufferSizeParams {}
diff --git a/drivers/gpu/nova-core/gsp/rm/commands.rs b/drivers/gpu/nova-core/gsp/rm/commands.rs
index 16bcf88644db..1d045e6f1afb 100644
--- a/drivers/gpu/nova-core/gsp/rm/commands.rs
+++ b/drivers/gpu/nova-core/gsp/rm/commands.rs
@@ -2,10 +2,14 @@
 
 use core::{
     array,
-    convert::Infallible, //
+    convert::Infallible,
+    mem::size_of, //
 };
 
-use kernel::prelude::*;
+use kernel::{
+    prelude::*,
+    transmute::FromBytes, //
+};
 
 use crate::{
     driver::Bar0,
@@ -94,7 +98,6 @@ fn read(
 }
 
 /// Sends an RM control command, checks the reply status, and returns the raw parameter bytes.
-#[expect(dead_code)]
 fn send_rm_control(
     cmdq: &Cmdq,
     bar: &Bar0,
@@ -109,3 +112,30 @@ fn send_rm_control(
 
     Ok(reply.params)
 }
+
+/// Sends the `CeGetFaultMethodBufferSize` RM control command and waits for its reply.
+///
+/// Returns the CE fault method buffer size in bytes.
+#[expect(dead_code)]
+pub(crate) fn get_ce_fault_method_buffer_size(
+    cmdq: &Cmdq,
+    bar: &Bar0,
+    h_client: u32,
+    h_subdevice: u32,
+) -> Result<u32> {
+    // Stack-allocate the request; CeGetFaultMethodBufferSizeParams is small (4 bytes).
+    let req = [0u8; size_of::<CeGetFaultMethodBufferSizeParams>()];
+
+    let reply = send_rm_control(
+        cmdq,
+        bar,
+        h_client,
+        h_subdevice,
+        RmControlMsgFunction::CeGetFaultMethodBufferSize,
+        &req,
+    )?;
+
+    let params = CeGetFaultMethodBufferSizeParams::from_bytes(&reply).ok_or(EINVAL)?;
+
+    Ok(params.size())
+}

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure
  2026-02-27 12:32 ` [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
@ 2026-03-02  8:00   ` Zhi Wang
  2026-03-09 22:08   ` Joel Fernandes
  2026-03-13 15:40   ` Danilo Krummrich
  2 siblings, 0 replies; 37+ messages in thread
From: Zhi Wang @ 2026-03-02  8:00 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On Fri, 27 Feb 2026 21:32:12 +0900
Eliot Courtney <ecourtney@nvidia.com> wrote:

Tested-by: Zhi Wang <zhiw@nvidia.com>

> Add `RmControl` which implements CommandToGsp for sending RM control
> RPCs.
> 
> Add `RmControlReply` which implements MessageFromGsp for getting the
> reply back.
> 
> Add `send_rm_control` which sends an RM control RPC via the command
> queue using the above structures.
> 
> This gives a generic way to send each RM control RPC. Each new RM
> control RPC can be added by extending RmControlMsgFunction and adding
> its bindings wrappers and writing a helper function to send it via
> `send_rm_control`.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp.rs             |   1 +
>  drivers/gpu/nova-core/gsp/rm.rs          |   3 +
>  drivers/gpu/nova-core/gsp/rm/commands.rs | 111
> +++++++++++++++++++++++++++++++ 3 files changed, 115 insertions(+)
> 
> diff --git a/drivers/gpu/nova-core/gsp.rs
> b/drivers/gpu/nova-core/gsp.rs index a6f3918c20b1..1a1c4e9808ac 100644
> --- a/drivers/gpu/nova-core/gsp.rs
> +++ b/drivers/gpu/nova-core/gsp.rs
> @@ -17,6 +17,7 @@
>  pub(crate) mod cmdq;
>  pub(crate) mod commands;
>  mod fw;
> +pub(crate) mod rm;
>  mod sequencer;
>  
>  pub(crate) use fw::{
> diff --git a/drivers/gpu/nova-core/gsp/rm.rs
> b/drivers/gpu/nova-core/gsp/rm.rs new file mode 100644
> index 000000000000..10e879a3e842
> --- /dev/null
> +++ b/drivers/gpu/nova-core/gsp/rm.rs
> @@ -0,0 +1,3 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +pub(crate) mod commands;
> diff --git a/drivers/gpu/nova-core/gsp/rm/commands.rs
> b/drivers/gpu/nova-core/gsp/rm/commands.rs new file mode 100644
> index 000000000000..16bcf88644db
> --- /dev/null
> +++ b/drivers/gpu/nova-core/gsp/rm/commands.rs
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use core::{
> +    array,
> +    convert::Infallible, //
> +};
> +
> +use kernel::prelude::*;
> +
> +use crate::{
> +    driver::Bar0,
> +    gsp::{
> +        cmdq::{
> +            Cmdq,
> +            CommandToGsp,
> +            MessageFromGsp, //
> +        },
> +        fw::{
> +            rm::*,
> +            MsgFunction,
> +            NvStatus, //
> +        },
> +    },
> +    sbuffer::SBufferIter,
> +};
> +
> +/// Command for sending an RM control message to the GSP.
> +struct RmControl<'a> {
> +    h_client: u32,
> +    h_object: u32,
> +    cmd: RmControlMsgFunction,
> +    params: &'a [u8],
> +}
> +
> +impl<'a> RmControl<'a> {
> +    /// Creates a new RM control command.
> +    fn new(h_client: u32, h_object: u32, cmd: RmControlMsgFunction,
> params: &'a [u8]) -> Self {
> +        Self {
> +            h_client,
> +            h_object,
> +            cmd,
> +            params,
> +        }
> +    }
> +}
> +
> +impl CommandToGsp for RmControl<'_> {
> +    const FUNCTION: MsgFunction = MsgFunction::GspRmControl;
> +    type Command = GspRmControl;
> +    type Reply = RmControlReply;
> +    type InitError = Infallible;
> +
> +    fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> +        GspRmControl::new(
> +            self.h_client,
> +            self.h_object,
> +            self.cmd,
> +            self.params.len() as u32,
> +        )
> +    }
> +
> +    fn variable_payload_len(&self) -> usize {
> +        self.params.len()
> +    }
> +
> +    fn init_variable_payload(
> +        &self,
> +        dst: &mut SBufferIter<array::IntoIter<&mut [u8], 2>>,
> +    ) -> Result {
> +        dst.write_all(self.params)
> +    }
> +}
> +
> +/// Response from an RM control message.
> +pub(crate) struct RmControlReply {
> +    status: NvStatus,
> +    params: KVVec<u8>,
> +}
> +
> +impl MessageFromGsp for RmControlReply {
> +    const FUNCTION: MsgFunction = MsgFunction::GspRmControl;
> +    type Message = GspRmControl;
> +    type InitError = Error;
> +
> +    fn read(
> +        msg: &Self::Message,
> +        sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
> +    ) -> Result<Self, Self::InitError> {
> +        Ok(RmControlReply {
> +            status: msg.status(),
> +            params: sbuffer.flush_into_kvvec(GFP_KERNEL)?,
> +        })
> +    }
> +}
> +
> +/// Sends an RM control command, checks the reply status, and
> returns the raw parameter bytes. +#[expect(dead_code)]
> +fn send_rm_control(
> +    cmdq: &Cmdq,
> +    bar: &Bar0,
> +    h_client: u32,
> +    h_object: u32,
> +    cmd: RmControlMsgFunction,
> +    params: &[u8],
> +) -> Result<KVVec<u8>> {
> +    let reply = cmdq.send_sync_command(bar, RmControl::new(h_client,
> h_object, cmd, params))?; +
> +    Result::from(reply.status)?;
> +
> +    Ok(reply.params)
> +}
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-02-27 12:32 ` [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles Eliot Courtney
@ 2026-03-09 21:22   ` Joel Fernandes
  2026-03-09 23:41     ` Joel Fernandes
  0 siblings, 1 reply; 37+ messages in thread
From: Joel Fernandes @ 2026-03-09 21:22 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
> needed for RM control calls.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>  drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
> index 4740cda0b51c..2cadfcaf9a8a 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>  /// The reply from the GSP to the [`GetGspInfo`] command.
>  pub(crate) struct GetGspStaticInfoReply {
>      gpu_name: [u8; 64],
> +    h_client: u32,
> +    h_subdevice: u32,

I would rather have more descriptive names please. 'client_handle',
'subdevice_handle'. Also some explanation of what a client and a sub-device
mean somewhere in the comments or documentation would be nice.

>  }
>  
>  impl MessageFromGsp for GetGspStaticInfoReply {
> @@ -210,6 +212,8 @@ fn read(
>      ) -> Result<Self, Self::InitError> {
>          Ok(GetGspStaticInfoReply {
>              gpu_name: msg.gpu_name_str(),
> +            h_client: msg.h_internal_client(),
> +            h_subdevice: msg.h_internal_subdevice(),
>          })
>      }
>  }
> @@ -236,6 +240,18 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
>              .to_str()
>              .map_err(GpuNameError::InvalidUtf8)
>      }
> +
> +    /// Returns the internal client handle allocated by GSP-RM.
> +    #[expect(dead_code)]
> +    pub(crate) fn h_client(&self) -> u32 {
> +        self.h_client
> +    }
> +
> +    /// Returns the internal subdevice handle allocated by GSP-RM.
> +    #[expect(dead_code)]
> +    pub(crate) fn h_subdevice(&self) -> u32 {
> +        self.h_subdevice
> +    }

Same here.

>  }
>  
>  /// Send the [`GetGspInfo`] command and awaits for its reply.
> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
> index 67f44421fcc3..aaf3509a0207 100644
> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
> @@ -115,6 +115,16 @@ impl GspStaticConfigInfo {
>      pub(crate) fn gpu_name_str(&self) -> [u8; 64] {
>          self.0.gpuNameString
>      }
> +
> +    /// Returns the internal client handle allocated by GSP-RM.
> +    pub(crate) fn h_internal_client(&self) -> u32 {
> +        self.0.hInternalClient

What is the difference between and internal handle and a non-internal one?

And again, descriptive function names.

> +    }
> +
> +    /// Returns the internal subdevice handle allocated by GSP-RM.
> +    pub(crate) fn h_internal_subdevice(&self) -> u32 {

Same here.

thanks,

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/9] gpu: nova-core: gsp: add types for RM control RPCs
  2026-02-27 12:32 ` [PATCH 5/9] gpu: nova-core: gsp: add types for RM control RPCs Eliot Courtney
@ 2026-03-09 21:45   ` Joel Fernandes
  2026-03-16 11:42     ` Eliot Courtney
  0 siblings, 1 reply; 37+ messages in thread
From: Joel Fernandes @ 2026-03-09 21:45 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On Fri, 27 Feb 2026, Eliot Courtney wrote:
[..]
> +/// Command code for RM control RPCs sent using [`MsgFunction::GspRmControl`].
> +#[derive(Copy, Clone, Debug, PartialEq)]
> +#[repr(u32)]
> +pub(crate) enum RmControlMsgFunction {
> +    /// Get the CE fault method buffer size.
> +    CeGetFaultMethodBufferSize = bindings::NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE,
> +}
> +
> +impl TryFrom<u32> for RmControlMsgFunction {
> +    type Error = kernel::error::Error;
> +
> +    fn try_from(value: u32) -> Result<Self> {

Other similar impls in the driver carry a TODO[FPRI] comment. Please add one
above this impl:

// TODO[FPRI]: replace with `FromPrimitive`.

> +        match value {
> +            bindings::NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE => {
> +                Ok(Self::CeGetFaultMethodBufferSize)
> +            }
> +            _ => Err(EINVAL),
> +        }
> +    }
> +}

Nit: the braces around the single-expression match arm are unnecessary:

    match value {
        bindings::NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE =>
            Ok(Self::CeGetFaultMethodBufferSize),
        _ => Err(EINVAL),
    }

thanks,

-- 
Joel Fernandes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-02-27 12:32 ` [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec` Eliot Courtney
@ 2026-03-09 21:53   ` Joel Fernandes
  2026-03-09 21:57   ` Danilo Krummrich
  1 sibling, 0 replies; 37+ messages in thread
From: Joel Fernandes @ 2026-03-09 21:53 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On Fri, Feb 27, 2026 at 09:32:11PM +0900, Eliot Courtney wrote:
> Add general `flush_into_vec` function. Add `flush_into_kvvec`
> convenience wrapper alongside the existing `flush_into_kvec` function.
> This is generally useful but immediately used for e.g. holding RM
> control payloads, which can be large (~>=20 KiB).
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>

thanks,

--
Joel Fernandes



> ---
>  drivers/gpu/nova-core/sbuffer.rs | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/sbuffer.rs b/drivers/gpu/nova-core/sbuffer.rs
> index 3a41d224c77a..38f8a8426521 100644
> --- a/drivers/gpu/nova-core/sbuffer.rs
> +++ b/drivers/gpu/nova-core/sbuffer.rs
> @@ -2,7 +2,13 @@
>  
>  use core::ops::Deref;
>  
> -use kernel::prelude::*;
> +use kernel::{
> +    alloc::{
> +        Allocator,
> +        KVec, //
> +    },
> +    prelude::*, //
> +};
>  
>  /// A buffer abstraction for discontiguous byte slices.
>  ///
> @@ -162,11 +168,14 @@ pub(crate) fn read_exact(&mut self, mut dst: &mut [u8]) -> Result {
>          Ok(())
>      }
>  
> -    /// Read all the remaining data into a [`KVec`].
> +    /// Read all the remaining data into a [`Vec`] with the given allocator.
>      ///
>      /// `self` will be empty after this operation.
> -    pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVec<u8>> {
> -        let mut buf = KVec::<u8>::new();
> +    pub(crate) fn flush_into_vec<A: Allocator>(
> +        &mut self,
> +        flags: kernel::alloc::Flags,
> +    ) -> Result<Vec<u8, A>> {
> +        let mut buf = Vec::<u8, A>::new();
>  
>          if let Some(slice) = core::mem::take(&mut self.cur_slice) {
>              buf.extend_from_slice(slice, flags)?;
> @@ -177,6 +186,20 @@ pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<
>  
>          Ok(buf)
>      }
> +
> +    /// Read all the remaining data into a [`KVec`].
> +    ///
> +    /// `self` will be empty after this operation.
> +    pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVec<u8>> {
> +        self.flush_into_vec(flags)
> +    }
> +
> +    /// Read all the remaining data into a [`KVVec`].
> +    ///
> +    /// `self` will be empty after this operation.
> +    pub(crate) fn flush_into_kvvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVVec<u8>> {
> +        self.flush_into_vec(flags)
> +    }
>  }
>  
>  /// Provides a way to get mutable slices of data to write into.
> 
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-02-27 12:32 ` [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec` Eliot Courtney
  2026-03-09 21:53   ` Joel Fernandes
@ 2026-03-09 21:57   ` Danilo Krummrich
  2026-03-09 22:01     ` Danilo Krummrich
  1 sibling, 1 reply; 37+ messages in thread
From: Danilo Krummrich @ 2026-03-09 21:57 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Alice Ryhl, Alexandre Courbot, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel

On 2/27/2026 1:32 PM, Eliot Courtney wrote:
> Add general `flush_into_vec` function. Add `flush_into_kvvec`
> convenience wrapper alongside the existing `flush_into_kvec` function.
> This is generally useful but immediately used for e.g. holding RM
> control payloads, which can be large (~>=20 KiB).

Why not just always use KVVec? It also seems that the KVec variant is not used?

If there's no reason for having both, I'd also just call this into_vec().

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-03-09 21:57   ` Danilo Krummrich
@ 2026-03-09 22:01     ` Danilo Krummrich
  2026-03-16 11:44       ` Eliot Courtney
  0 siblings, 1 reply; 37+ messages in thread
From: Danilo Krummrich @ 2026-03-09 22:01 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Alice Ryhl, Alexandre Courbot, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel

On Mon Mar 9, 2026 at 10:57 PM CET, Danilo Krummrich wrote:
> On 2/27/2026 1:32 PM, Eliot Courtney wrote:
>> Add general `flush_into_vec` function. Add `flush_into_kvvec`
>> convenience wrapper alongside the existing `flush_into_kvec` function.
>> This is generally useful but immediately used for e.g. holding RM
>> control payloads, which can be large (~>=20 KiB).
>
> Why not just always use KVVec? It also seems that the KVec variant is not used?

(Besides its single usage in GspSequence, which wouldn't hurt to be a KVVec.)

> If there's no reason for having both, I'd also just call this into_vec().


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure
  2026-02-27 12:32 ` [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
  2026-03-02  8:00   ` Zhi Wang
@ 2026-03-09 22:08   ` Joel Fernandes
  2026-03-13 15:40   ` Danilo Krummrich
  2 siblings, 0 replies; 37+ messages in thread
From: Joel Fernandes @ 2026-03-09 22:08 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On Fri, 27 Feb 2026, Eliot Courtney wrote:

> +fn send_rm_control(
> +    cmdq: &Cmdq,
> +    bar: &Bar0,
> +    h_client: u32,
> +    h_object: u32,
> +    cmd: RmControlMsgFunction,
> +    params: &[u8],
> +) -> Result<KVVec<u8>> {
> +    let reply = cmdq.send_sync_command(bar, RmControl::new(h_client, h_object, cmd, params))?;

This series uses `send_sync_command`, but the cmdq locking series v3 [1]
renamed it to `send_command`? This series will need to be rebased and reposted.

[1]: https://lore.kernel.org/all/20260304-cmdq-locking-v3-0-a6314b708850@nvidia.com/

thanks,

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/9] gpu: nova-core: gsp: add CE fault method buffer size bindings
  2026-02-27 12:32 ` [PATCH 8/9] gpu: nova-core: gsp: add CE fault method buffer size bindings Eliot Courtney
@ 2026-03-09 22:08   ` Joel Fernandes
  0 siblings, 0 replies; 37+ messages in thread
From: Joel Fernandes @ 2026-03-09 22:08 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On Fri, 27 Feb 2026, Eliot Courtney wrote:

> Add the bindings for CE fault method buffer size RM control RPC. This
> will be needed for channel allocation and is also a simple way to
> demonstrate the usage of the `send_rm_control` infrastructure for now.

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>

--
Joel Fernandes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 9/9] gpu: nova-core: gsp: add CeGetFaultMethodBufferSize RM control command
  2026-02-27 12:32 ` [PATCH 9/9] gpu: nova-core: gsp: add CeGetFaultMethodBufferSize RM control command Eliot Courtney
@ 2026-03-09 22:23   ` Joel Fernandes
  0 siblings, 0 replies; 37+ messages in thread
From: Joel Fernandes @ 2026-03-09 22:23 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On Fri, Feb 27, 2026 at 09:32:14PM +0900, Eliot Courtney wrote:
> Add `CeGetFaultMethodBufferSizeParams` which wraps the bindings.
> 
> Add `get_ce_fault_method_buffer_size` which sends the RM control RPC
> and returns the buffer size. This is needed for channel allocation.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/fw/rm.rs       | 17 +++++++++++++++
>  drivers/gpu/nova-core/gsp/rm/commands.rs | 36 +++++++++++++++++++++++++++++---
>  2 files changed, 50 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/gsp/fw/rm.rs b/drivers/gpu/nova-core/gsp/fw/rm.rs
> index 8bb7b11736b9..9f1e3546d39d 100644
> --- a/drivers/gpu/nova-core/gsp/fw/rm.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/rm.rs
> @@ -80,3 +80,20 @@ unsafe impl FromBytes for GspRmControl {}
>  
>  // SAFETY: This struct contains no padding.
>  unsafe impl AsBytes for GspRmControl {}
> +
> +/// Wrapper for [`bindings::NV2080_CTRL_CE_GET_FAULT_METHOD_BUFFER_SIZE_PARAMS`].
> +#[derive(Zeroable)]

nit: This patch is not using zeroed(), so do you really need Zeroable here?
Also check other patches where you might have added this trait but not used it.

Other than this nit,
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>

thanks,

--
Joel Fernandes






> +#[repr(transparent)]
> +pub(crate) struct CeGetFaultMethodBufferSizeParams(
> +    bindings::NV2080_CTRL_CE_GET_FAULT_METHOD_BUFFER_SIZE_PARAMS,
> +);
> +
> +impl CeGetFaultMethodBufferSizeParams {
> +    /// Returns the CE fault method buffer size in bytes.
> +    pub(crate) fn size(&self) -> u32 {
> +        self.0.size
> +    }
> +}
> +
> +// SAFETY: This struct only contains integer types for which all bit patterns are valid.
> +unsafe impl FromBytes for CeGetFaultMethodBufferSizeParams {}
> diff --git a/drivers/gpu/nova-core/gsp/rm/commands.rs b/drivers/gpu/nova-core/gsp/rm/commands.rs
> index 16bcf88644db..1d045e6f1afb 100644
> --- a/drivers/gpu/nova-core/gsp/rm/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/rm/commands.rs
> @@ -2,10 +2,14 @@
>  
>  use core::{
>      array,
> -    convert::Infallible, //
> +    convert::Infallible,
> +    mem::size_of, //
>  };
>  
> -use kernel::prelude::*;
> +use kernel::{
> +    prelude::*,
> +    transmute::FromBytes, //
> +};
>  
>  use crate::{
>      driver::Bar0,
> @@ -94,7 +98,6 @@ fn read(
>  }
>  
>  /// Sends an RM control command, checks the reply status, and returns the raw parameter bytes.
> -#[expect(dead_code)]
>  fn send_rm_control(
>      cmdq: &Cmdq,
>      bar: &Bar0,
> @@ -109,3 +112,30 @@ fn send_rm_control(
>  
>      Ok(reply.params)
>  }
> +
> +/// Sends the `CeGetFaultMethodBufferSize` RM control command and waits for its reply.
> +///
> +/// Returns the CE fault method buffer size in bytes.
> +#[expect(dead_code)]
> +pub(crate) fn get_ce_fault_method_buffer_size(
> +    cmdq: &Cmdq,
> +    bar: &Bar0,
> +    h_client: u32,
> +    h_subdevice: u32,
> +) -> Result<u32> {
> +    // Stack-allocate the request; CeGetFaultMethodBufferSizeParams is small (4 bytes).
> +    let req = [0u8; size_of::<CeGetFaultMethodBufferSizeParams>()];
> +
> +    let reply = send_rm_control(
> +        cmdq,
> +        bar,
> +        h_client,
> +        h_subdevice,
> +        RmControlMsgFunction::CeGetFaultMethodBufferSize,
> +        &req,
> +    )?;
> +
> +    let params = CeGetFaultMethodBufferSizeParams::from_bytes(&reply).ok_or(EINVAL)?;
> +
> +    Ok(params.size())
> +}
> 
> -- 
> 2.53.0
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-03-09 21:22   ` Joel Fernandes
@ 2026-03-09 23:41     ` Joel Fernandes
  2026-03-10  0:06       ` John Hubbard
  0 siblings, 1 reply; 37+ messages in thread
From: Joel Fernandes @ 2026-03-09 23:41 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel



> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
> 
> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>> needed for RM control calls.
>> 
>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>> ---
>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>> 2 files changed, 26 insertions(+)
>> 
>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>> index 4740cda0b51c..2cadfcaf9a8a 100644
>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>> /// The reply from the GSP to the [`GetGspInfo`] command.
>> pub(crate) struct GetGspStaticInfoReply {
>>     gpu_name: [u8; 64],
>> +    h_client: u32,
>> +    h_subdevice: u32,
> 
> I would rather have more descriptive names please. 'client_handle',
> 'subdevice_handle'. Also some explanation of what a client and a sub-device
> mean somewhere in the comments or documentation would be nice.

Also just checking if we can have repr wrappers around the u32 for clients /
handles.  These concepts are quite common in Nvidia drivers so we should
probably create new types for them.

And if we can please document the terminology, device, subset, clients handles
etc. and add new Documentation/ entries even.

Thoughts?

-- 
Joel Fernandes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-03-09 23:41     ` Joel Fernandes
@ 2026-03-10  0:06       ` John Hubbard
  2026-03-10  2:17         ` Joel Fernandes
  0 siblings, 1 reply; 37+ messages in thread
From: John Hubbard @ 2026-03-10  0:06 UTC (permalink / raw)
  To: Joel Fernandes, Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel

On 3/9/26 4:41 PM, Joel Fernandes wrote:
>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>> needed for RM control calls.
>>>
>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>> ---
>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>> 2 files changed, 26 insertions(+)
>>>
>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>> pub(crate) struct GetGspStaticInfoReply {
>>>     gpu_name: [u8; 64],
>>> +    h_client: u32,
>>> +    h_subdevice: u32,
>>
>> I would rather have more descriptive names please. 'client_handle',

Maybe it's better to mirror the Open RM names, which are ancient and
well known in those circles. Changing them at this point is probably
going to result in a slightly worse situation, because there are
probably millions of lines of code out there that use the existing
nomenclature.

However...

>> 'subdevice_handle'. Also some explanation of what a client and a sub-device
>> mean somewhere in the comments or documentation would be nice.

Yes, although I expect you can simply refer to some well known pre-
existing documentation from NVIDAI for that!

> 
> Also just checking if we can have repr wrappers around the u32 for clients /
> handles.  These concepts are quite common in Nvidia drivers so we should
> probably create new types for them.
> 
> And if we can please document the terminology, device, subset, clients handles
> etc. and add new Documentation/ entries even.
> 
> Thoughts?
> 

This has already been done countless times by countless people I
think, and so we don't need to do it again. Just refer to existing
docs.

btw, as an aside:

I'm checking with our GSP firmware team to be sure, but my
understanding is that much of this is actually very temporary. Because
the GSP team does not want to continue on with this model in which
GSP has to maintain that kind of state: an internal hierarchy of
objects. Instead, they are hoping to move to an API in which nova
would directly refer to each object/item in GSP. And subdevice, in 
particular, is an old SLI term that no one wants to keep around
either. It was an ugly hack in Open RM that took more than a decade
to recover from, by moving the SLI concept out to user space.

So even though we should document what we're doing now, I would like
to also note that we suspect a certain amount of this will 
disappear, to be replaced with a somewhat simpler API, in the 
somewhat near future.

thanks,
-- 
John Hubbard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-03-10  0:06       ` John Hubbard
@ 2026-03-10  2:17         ` Joel Fernandes
  2026-03-10  2:29           ` John Hubbard
  2026-03-10  2:36           ` Alexandre Courbot
  0 siblings, 2 replies; 37+ messages in thread
From: Joel Fernandes @ 2026-03-10  2:17 UTC (permalink / raw)
  To: John Hubbard
  Cc: Eliot Courtney, Danilo Krummrich, Alice Ryhl, Alexandre Courbot,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

Hi John,

> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>> needed for RM control calls.
>>>> 
>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>> ---
>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>> 2 files changed, 26 insertions(+)
>>>> 
>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>    gpu_name: [u8; 64],
>>>> +    h_client: u32,
>>>> +    h_subdevice: u32,
>>> 
>>> I would rather have more descriptive names please. 'client_handle',
> 
> Maybe it's better to mirror the Open RM names, which are ancient and
> well known in those circles. Changing them at this point is probably
> going to result in a slightly worse situation, because there are
> probably millions of lines of code out there that use the existing
> nomenclature.

I have to disagree a bit here. Saying h_ in code is a bit meaningless:
there is no mention of the word "handle" anywhere near these fields.
h_ could mean "higher", "hardware", or any number of things. The only
reason I know it means "handle" is because of expertise with Nvidia
drivers. The `_handle` suffix is self-documenting; `h_` is not.

> 
> However...
> 
>>> 'subdevice_handle'. Also some explanation of what a client and a sub-device
>>> mean somewhere in the comments or documentation would be nice.
> 
> Yes, although I expect you can simply refer to some well known pre-
> existing documentation from NVIDAI for that!

I apologize but I am a bit concerned with this approach because it feels we
are drifting into black box dev without encouraging more code comments,
documentation and cleaner code.

We need to make the driver as readable and well documented as possible, we
do not want another Nouveau with magic numbers and magic variable names.

Very least I would expect at least one or two lines of code comments of
what is a handle, what is a client, what is an internal client handle
versus not. I guess I do not understand what is the hesitation?

Sure external documentation is good but to clarify, I am referring to a few
code comments. That's not much to ask right?

Elaborate documentation files in kernel can be optional but there is
probably no harm in citing external references from in-kernel docs too. But
again I was more concerned about code comments and variable names.

> 
>> 
>> Also just checking if we can have repr wrappers around the u32 for clients /
>> handles.  These concepts are quite common in Nvidia drivers so we should
>> probably create new types for them.
>> 
>> And if we can please document the terminology, device, subset, clients handles
>> etc. and add new Documentation/ entries even.
>> 
>> Thoughts?
>> 
> 
> This has already been done countless times by countless people I
> think, and so we don't need to do it again. Just refer to existing
> docs.

Not sure if you are referring to nova-core repr or docs, but the repr
technique to wrap raw integers is pretty common in the firmware layer of
nova-core.

> 
> btw, as an aside:
> 
> I'm checking with our GSP firmware team to be sure, but my
> understanding is that much of this is actually very temporary. Because
> the GSP team does not want to continue on with this model in which
> GSP has to maintain that kind of state: an internal hierarchy of
> objects. Instead, they are hoping to move to an API in which nova
> would directly refer to each object/item in GSP. And subdevice, in

Even if this is "temporary" we can't just say "Oh, I'll just add these
legacy things and badly write them, because they're going anyway at some
unpredictable point in the future". After all, this is the Linux kernel we
are talking about :-). The bar is quite high.

> particular, is an old SLI term that no one wants to keep around
> either. It was an ugly hack in Open RM that took more than a decade
> to recover from, by moving the SLI concept out to user space.
> 
> So even though we should document what we're doing now, I would like
> to also note that we suspect a certain amount of this will
> disappear, to be replaced with a somewhat simpler API, in the
> somewhat near future.

Sure, but client handles are a broader GPU driver concept even if this
particular one is GSP-internal. We are certainly going to need a rust type
to represent a client right? Other GPU drivers also have concept of
clients. The point is not that `hInternalClient` represents a GPU user
today, it may well be temporary as you note, but that using
`#[repr(transparent)]` new types for raw u32 handles costs nothing and
makes the code better and more readable. This pattern is already
well-established in nova-core itself: see `PackedRegistryEntry` for example
being a repr type. IMHO, there should be little reason that we need the
struct to have magic u32 numbers in Rust code for concepts like "handles".

All I am saying is let us think this through before just doing the shortcut
of using u32 for client handles, etc. Rust gives us rich types, lets use them.

thanks,

-- 
Joel Fernandes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-03-10  2:17         ` Joel Fernandes
@ 2026-03-10  2:29           ` John Hubbard
  2026-03-10 18:48             ` Joel Fernandes
  2026-03-10  2:36           ` Alexandre Courbot
  1 sibling, 1 reply; 37+ messages in thread
From: John Hubbard @ 2026-03-10  2:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Eliot Courtney, Danilo Krummrich, Alice Ryhl, Alexandre Courbot,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On 3/9/26 7:17 PM, Joel Fernandes wrote:
> Hi John,
> 
>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>>> needed for RM control calls.
>>>>>
>>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>>> ---
>>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>>> 2 files changed, 26 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>>    gpu_name: [u8; 64],
>>>>> +    h_client: u32,
>>>>> +    h_subdevice: u32,
>>>>
>>>> I would rather have more descriptive names please. 'client_handle',
>>
>> Maybe it's better to mirror the Open RM names, which are ancient and
>> well known in those circles. Changing them at this point is probably
>> going to result in a slightly worse situation, because there are
>> probably millions of lines of code out there that use the existing
>> nomenclature.
> 
> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
> there is no mention of the word "handle" anywhere near these fields.
> h_ could mean "higher", "hardware", or any number of things. The only
> reason I know it means "handle" is because of expertise with Nvidia
> drivers. The `_handle` suffix is self-documenting; `h_` is not.

Maybe if we write h_client in the code, and "handle..." in a comment
above it? Or the other way around. Just to make the connection to
the Open RM client code that is out there.

> 
>>
>> However...
>>
>>>> 'subdevice_handle'. Also some explanation of what a client and a sub-device
>>>> mean somewhere in the comments or documentation would be nice.
>>
>> Yes, although I expect you can simply refer to some well known pre-
>> existing documentation from NVIDAI for that!
> 
> I apologize but I am a bit concerned with this approach because it feels we
> are drifting into black box dev without encouraging more code comments,
> documentation and cleaner code.
> 
> We need to make the driver as readable and well documented as possible, we
> do not want another Nouveau with magic numbers and magic variable names.
> 
> Very least I would expect at least one or two lines of code comments of
> what is a handle, what is a client, what is an internal client handle
> versus not. I guess I do not understand what is the hesitation?
> 
> Sure external documentation is good but to clarify, I am referring to a few
> code comments. That's not much to ask right?
> 

Sure, a short amount of comments would help, that sounds good.


> Elaborate documentation files in kernel can be optional but there is
> probably no harm in citing external references from in-kernel docs too. But
> again I was more concerned about code comments and variable names.
> 
>>
>>>
>>> Also just checking if we can have repr wrappers around the u32 for clients /
>>> handles.  These concepts are quite common in Nvidia drivers so we should
>>> probably create new types for them.
>>>
>>> And if we can please document the terminology, device, subset, clients handles
>>> etc. and add new Documentation/ entries even.
>>>
>>> Thoughts?
>>>
>>
>> This has already been done countless times by countless people I
>> think, and so we don't need to do it again. Just refer to existing
>> docs.
> 
> Not sure if you are referring to nova-core repr or docs, but the repr
> technique to wrap raw integers is pretty common in the firmware layer of
> nova-core.
> 
>>
>> btw, as an aside:
>>
>> I'm checking with our GSP firmware team to be sure, but my
>> understanding is that much of this is actually very temporary. Because
>> the GSP team does not want to continue on with this model in which
>> GSP has to maintain that kind of state: an internal hierarchy of
>> objects. Instead, they are hoping to move to an API in which nova
>> would directly refer to each object/item in GSP. And subdevice, in
> 
> Even if this is "temporary" we can't just say "Oh, I'll just add these
> legacy things and badly write them, because they're going anyway at some
> unpredictable point in the future". After all, this is the Linux kernel we
> are talking about :-). The bar is quite high.

I am not saying any such thing.

> 
>> particular, is an old SLI term that no one wants to keep around
>> either. It was an ugly hack in Open RM that took more than a decade
>> to recover from, by moving the SLI concept out to user space.
>>
>> So even though we should document what we're doing now, I would like
>> to also note that we suspect a certain amount of this will
>> disappear, to be replaced with a somewhat simpler API, in the
>> somewhat near future.
> 
> Sure, but client handles are a broader GPU driver concept even if this
> particular one is GSP-internal. We are certainly going to need a rust type
> to represent a client right? Other GPU drivers also have concept of
> clients. The point is not that `hInternalClient` represents a GPU user
> today, it may well be temporary as you note, but that using
> `#[repr(transparent)]` new types for raw u32 handles costs nothing and
> makes the code better and more readable. This pattern is already
> well-established in nova-core itself: see `PackedRegistryEntry` for example
> being a repr type. IMHO, there should be little reason that we need the
> struct to have magic u32 numbers in Rust code for concepts like "handles".
> 

We will debate this when it shows up, perhaps I should not have 
mentioned it, other than to remind Eliot to make it easy to delete.

> All I am saying is let us think this through before just doing the shortcut
> of using u32 for client handles, etc. Rust gives us rich types, lets use them.
> 

ohh, that's a whole other topic and idea. I wasn't going into that,
but feel free to explore if Rust can make it better.

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-03-10  2:17         ` Joel Fernandes
  2026-03-10  2:29           ` John Hubbard
@ 2026-03-10  2:36           ` Alexandre Courbot
  2026-03-10  4:02             ` Eliot Courtney
  1 sibling, 1 reply; 37+ messages in thread
From: Alexandre Courbot @ 2026-03-10  2:36 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: John Hubbard, Eliot Courtney, Danilo Krummrich, Alice Ryhl,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel

On Tue Mar 10, 2026 at 11:17 AM JST, Joel Fernandes wrote:
> Hi John,
>
>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>> 
>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>>> needed for RM control calls.
>>>>> 
>>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>>> ---
>>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>>> 2 files changed, 26 insertions(+)
>>>>> 
>>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>>    gpu_name: [u8; 64],
>>>>> +    h_client: u32,
>>>>> +    h_subdevice: u32,
>>>> 
>>>> I would rather have more descriptive names please. 'client_handle',
>> 
>> Maybe it's better to mirror the Open RM names, which are ancient and
>> well known in those circles. Changing them at this point is probably
>> going to result in a slightly worse situation, because there are
>> probably millions of lines of code out there that use the existing
>> nomenclature.
>
> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
> there is no mention of the word "handle" anywhere near these fields.
> h_ could mean "higher", "hardware", or any number of things. The only
> reason I know it means "handle" is because of expertise with Nvidia
> drivers. The `_handle` suffix is self-documenting; `h_` is not.

I tend to agree with Joel that we should try to avoid NVisms when they
get in the way of clarity - that's what we did so far actually. We can
always mention the RM name of fields in the doccomments.

The only exception being generated bindings, but they reside in their
own module and are opaque to the rest of the driver.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-03-10  2:36           ` Alexandre Courbot
@ 2026-03-10  4:02             ` Eliot Courtney
  2026-03-10 10:35               ` Danilo Krummrich
  0 siblings, 1 reply; 37+ messages in thread
From: Eliot Courtney @ 2026-03-10  4:02 UTC (permalink / raw)
  To: Alexandre Courbot, Joel Fernandes
  Cc: John Hubbard, Eliot Courtney, Danilo Krummrich, Alice Ryhl,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel,
	dri-devel

On Tue Mar 10, 2026 at 11:36 AM JST, Alexandre Courbot wrote:
> On Tue Mar 10, 2026 at 11:17 AM JST, Joel Fernandes wrote:
>> Hi John,
>>
>>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>> 
>>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>>>> needed for RM control calls.
>>>>>> 
>>>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>>>> ---
>>>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>>>> 2 files changed, 26 insertions(+)
>>>>>> 
>>>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>>>    gpu_name: [u8; 64],
>>>>>> +    h_client: u32,
>>>>>> +    h_subdevice: u32,
>>>>> 
>>>>> I would rather have more descriptive names please. 'client_handle',
>>> 
>>> Maybe it's better to mirror the Open RM names, which are ancient and
>>> well known in those circles. Changing them at this point is probably
>>> going to result in a slightly worse situation, because there are
>>> probably millions of lines of code out there that use the existing
>>> nomenclature.
>>
>> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
>> there is no mention of the word "handle" anywhere near these fields.
>> h_ could mean "higher", "hardware", or any number of things. The only
>> reason I know it means "handle" is because of expertise with Nvidia
>> drivers. The `_handle` suffix is self-documenting; `h_` is not.
>
> I tend to agree with Joel that we should try to avoid NVisms when they
> get in the way of clarity - that's what we did so far actually. We can
> always mention the RM name of fields in the doccomments.
>
> The only exception being generated bindings, but they reside in their
> own module and are opaque to the rest of the driver.

Thanks everyone for your comments so far. I don't mind about the naming,
I can see both arguments. But we have two votes for different names for
h_client/h_subdevice/etc so far, so I'll go with that for now. If anyone
has any other suggested names keen to hear.

Having newtypes for client/device/subdevice/object is easy to do and
will help a lot with calling functions that take a bunch of these as
arguments so I think that is a great idea.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-03-10  4:02             ` Eliot Courtney
@ 2026-03-10 10:35               ` Danilo Krummrich
  0 siblings, 0 replies; 37+ messages in thread
From: Danilo Krummrich @ 2026-03-10 10:35 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Alexandre Courbot, Joel Fernandes, John Hubbard, Alice Ryhl,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel,
	dri-devel

On Tue Mar 10, 2026 at 5:02 AM CET, Eliot Courtney wrote:
> On Tue Mar 10, 2026 at 11:36 AM JST, Alexandre Courbot wrote:
>> On Tue Mar 10, 2026 at 11:17 AM JST, Joel Fernandes wrote:
>>>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>>>> +    h_client: u32,
>>>>>>> +    h_subdevice: u32,
>>>>>> 
>>>>>> I would rather have more descriptive names please. 'client_handle',
>>>> 
>>>> Maybe it's better to mirror the Open RM names, which are ancient and
>>>> well known in those circles. Changing them at this point is probably
>>>> going to result in a slightly worse situation, because there are
>>>> probably millions of lines of code out there that use the existing
>>>> nomenclature.
>>>
>>> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
>>> there is no mention of the word "handle" anywhere near these fields.
>>> h_ could mean "higher", "hardware", or any number of things. The only
>>> reason I know it means "handle" is because of expertise with Nvidia
>>> drivers. The `_handle` suffix is self-documenting; `h_` is not.
>>
>> I tend to agree with Joel that we should try to avoid NVisms when they
>> get in the way of clarity - that's what we did so far actually. We can
>> always mention the RM name of fields in the doccomments.
>>
>> The only exception being generated bindings, but they reside in their
>> own module and are opaque to the rest of the driver.
>
> Thanks everyone for your comments so far. I don't mind about the naming,
> I can see both arguments. But we have two votes for different names for
> h_client/h_subdevice/etc so far, so I'll go with that for now. If anyone
> has any other suggested names keen to hear.

Please don't mirror names from other drivers (regardless whether it is Open RM,
nouveau, etc.) just to be consistent with the naming of other drivers. If the
name is good, great, otherwise please pick what makes the most sense.

In this case h_client and h_subdevice aren't great names at all. I don't want to
establish a pattern where we prefix handles with 'h' to distinguish them from
other values.

There is no reason to have such a convention if we can just represent a
handle through a proper type (which has other advantages as well), i.e. we don't
need to resolve any ambiguity due to a primitive type.

For instance, we could have a gsp::Handle type and then this could just become:

	client: gsp::Handle,
	subdev: gsp::Handle,

If we want type safety to a point where we can't even confuse different types of
handles, we could give them a type state, so we don't have to re-implement the
same thing multiple times, e.g.


	client: gsp::Handle<Client>,
	subdev: gsp::Handle<Subdev>,

I.e. when a method expects a client handle, nothing else can be passed it.

- Danilo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles
  2026-03-10  2:29           ` John Hubbard
@ 2026-03-10 18:48             ` Joel Fernandes
  0 siblings, 0 replies; 37+ messages in thread
From: Joel Fernandes @ 2026-03-10 18:48 UTC (permalink / raw)
  To: John Hubbard, Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel



On 3/9/2026 10:29 PM, John Hubbard wrote:
> On 3/9/26 7:17 PM, Joel Fernandes wrote:
>> Hi John,
>>
>>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>>
>>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>>>> needed for RM control calls.
>>>>>>
>>>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>>>> ---
>>>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>>>> 2 files changed, 26 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>>>    gpu_name: [u8; 64],
>>>>>> +    h_client: u32,
>>>>>> +    h_subdevice: u32,
>>>>>
>>>>> I would rather have more descriptive names please. 'client_handle',
>>>
>>> Maybe it's better to mirror the Open RM names, which are ancient and
>>> well known in those circles. Changing them at this point is probably
>>> going to result in a slightly worse situation, because there are
>>> probably millions of lines of code out there that use the existing
>>> nomenclature.
>>
>> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
>> there is no mention of the word "handle" anywhere near these fields.
>> h_ could mean "higher", "hardware", or any number of things. The only
>> reason I know it means "handle" is because of expertise with Nvidia
>> drivers. The `_handle` suffix is self-documenting; `h_` is not.
> 
> Maybe if we write h_client in the code, and "handle..." in a comment
> above it? Or the other way around. Just to make the connection to
> the Open RM client code that is out there.
> 
Sure, a descriptive name is better but if we're going with descriptive type
names instead, then perhaps a comment with a descriptive type name would be fine
with me too.

Here are also some sample comments that could be used (Eliot do confirm
accuracy, but this is from my notes/research):

A client is a handle that provides a namespace and lifetime boundary for GPU
resource access. All child objects (devices, memory, channels) are scoped to it
and freed when it's destroyed. Under a client, we have Device and Sub-Device
handles.

The Device/Sub-Device split comes from SLI (multiple GPUs acting as one), where
the Device broadcasts operations to all GPUs, while the Sub-Device targets a
single GPU. Today with single-GPU setups, the pair is still mandatory but always
created together as a 1:1 mapping to a single physical GPU.

>>> particular, is an old SLI term that no one wants to keep around
>>> either. It was an ugly hack in Open RM that took more than a decade
>>> to recover from, by moving the SLI concept out to user space.
>>>
>>> So even though we should document what we're doing now, I would like
>>> to also note that we suspect a certain amount of this will
>>> disappear, to be replaced with a somewhat simpler API, in the
>>> somewhat near future.
>>
>> Sure, but client handles are a broader GPU driver concept even if this
>> particular one is GSP-internal. We are certainly going to need a rust type
>> to represent a client right? Other GPU drivers also have concept of
>> clients. The point is not that `hInternalClient` represents a GPU user
>> today, it may well be temporary as you note, but that using
>> `#[repr(transparent)]` new types for raw u32 handles costs nothing and
>> makes the code better and more readable. This pattern is already
>> well-established in nova-core itself: see `PackedRegistryEntry` for example
>> being a repr type. IMHO, there should be little reason that we need the
>> struct to have magic u32 numbers in Rust code for concepts like "handles".
>>
> 
> We will debate this when it shows up, perhaps I should not have 
> mentioned it, other than to remind Eliot to make it easy to delete.
> 
>> All I am saying is let us think this through before just doing the shortcut
>> of using u32 for client handles, etc. Rust gives us rich types, lets use them.
>>
> 
> ohh, that's a whole other topic and idea. I wasn't going into that,
> but feel free to explore if Rust can make it better.
Oh ok, yeah it sounds like we are aligned on this with Eliot's other reply with
introducing new rich types for these, so we should be good there (with abundant
comments on the rich types). Will explore this further on my side as well.

Thanks!

--
Joel Fernandes








^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure
  2026-02-27 12:32 ` [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
  2026-03-02  8:00   ` Zhi Wang
  2026-03-09 22:08   ` Joel Fernandes
@ 2026-03-13 15:40   ` Danilo Krummrich
  2026-03-16 12:06     ` Eliot Courtney
  2 siblings, 1 reply; 37+ messages in thread
From: Danilo Krummrich @ 2026-03-13 15:40 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Alice Ryhl, Alexandre Courbot, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel

On Fri Feb 27, 2026 at 1:32 PM CET, Eliot Courtney wrote:
> +/// Command for sending an RM control message to the GSP.
> +struct RmControl<'a> {
> +    h_client: u32,
> +    h_object: u32,
> +    cmd: RmControlMsgFunction,
> +    params: &'a [u8],
> +}

Please expand the documentation, especially the fields.

> +impl<'a> RmControl<'a> {
> +    /// Creates a new RM control command.
> +    fn new(h_client: u32, h_object: u32, cmd: RmControlMsgFunction, params: &'a [u8]) -> Self {
> +        Self {
> +            h_client,
> +            h_object,
> +            cmd,
> +            params,
> +        }
> +    }
> +}
> +
> +impl CommandToGsp for RmControl<'_> {
> +    const FUNCTION: MsgFunction = MsgFunction::GspRmControl;
> +    type Command = GspRmControl;
> +    type Reply = RmControlReply;
> +    type InitError = Infallible;
> +
> +    fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> +        GspRmControl::new(
> +            self.h_client,
> +            self.h_object,
> +            self.cmd,
> +            self.params.len() as u32,
> +        )
> +    }
> +
> +    fn variable_payload_len(&self) -> usize {
> +        self.params.len()
> +    }
> +
> +    fn init_variable_payload(
> +        &self,
> +        dst: &mut SBufferIter<array::IntoIter<&mut [u8], 2>>,
> +    ) -> Result {
> +        dst.write_all(self.params)
> +    }
> +}
> +
> +/// Response from an RM control message.
> +pub(crate) struct RmControlReply {
> +    status: NvStatus,
> +    params: KVVec<u8>,
> +}
> +
> +impl MessageFromGsp for RmControlReply {
> +    const FUNCTION: MsgFunction = MsgFunction::GspRmControl;
> +    type Message = GspRmControl;
> +    type InitError = Error;
> +
> +    fn read(
> +        msg: &Self::Message,
> +        sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
> +    ) -> Result<Self, Self::InitError> {
> +        Ok(RmControlReply {
> +            status: msg.status(),
> +            params: sbuffer.flush_into_kvvec(GFP_KERNEL)?,
> +        })
> +    }
> +}
> +
> +/// Sends an RM control command, checks the reply status, and returns the raw parameter bytes.
> +#[expect(dead_code)]
> +fn send_rm_control(

Why isn't this a method of Cmdq?

> +    cmdq: &Cmdq,
> +    bar: &Bar0,
> +    h_client: u32,
> +    h_object: u32,
> +    cmd: RmControlMsgFunction,
> +    params: &[u8],
> +) -> Result<KVVec<u8>> {
> +    let reply = cmdq.send_sync_command(bar, RmControl::new(h_client, h_object, cmd, params))?;

Why not let the caller construct RmControl?

> +
> +    Result::from(reply.status)?;
> +
> +    Ok(reply.params)
> +}
>
> -- 
> 2.53.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/9] gpu: nova-core: gsp: add types for RM control RPCs
  2026-03-09 21:45   ` Joel Fernandes
@ 2026-03-16 11:42     ` Eliot Courtney
  0 siblings, 0 replies; 37+ messages in thread
From: Eliot Courtney @ 2026-03-16 11:42 UTC (permalink / raw)
  To: Joel Fernandes, Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, Alexandre Courbot, David Airlie,
	Simona Vetter, rust-for-linux, nouveau, dri-devel, linux-kernel,
	dri-devel

On Tue Mar 10, 2026 at 6:45 AM JST, Joel Fernandes wrote:
> On Fri, 27 Feb 2026, Eliot Courtney wrote:
> [..]
>> +/// Command code for RM control RPCs sent using [`MsgFunction::GspRmControl`].
>> +#[derive(Copy, Clone, Debug, PartialEq)]
>> +#[repr(u32)]
>> +pub(crate) enum RmControlMsgFunction {
>> +    /// Get the CE fault method buffer size.
>> +    CeGetFaultMethodBufferSize = bindings::NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE,
>> +}
>> +
>> +impl TryFrom<u32> for RmControlMsgFunction {
>> +    type Error = kernel::error::Error;
>> +
>> +    fn try_from(value: u32) -> Result<Self> {
>
> Other similar impls in the driver carry a TODO[FPRI] comment. Please add one
> above this impl:
>
> // TODO[FPRI]: replace with `FromPrimitive`.
>
>> +        match value {
>> +            bindings::NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE => {
>> +                Ok(Self::CeGetFaultMethodBufferSize)
>> +            }
>> +            _ => Err(EINVAL),
>> +        }
>> +    }
>> +}

Will do, thanks.

>
> Nit: the braces around the single-expression match arm are unnecessary:
>
>     match value {
>         bindings::NV2080_CTRL_CMD_CE_GET_FAULT_METHOD_BUFFER_SIZE =>
>             Ok(Self::CeGetFaultMethodBufferSize),
>         _ => Err(EINVAL),
>     }
>
> thanks,

This does not pass rustfmtcheck, probably because rustfmt wants braces
on multilines like this.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-03-09 22:01     ` Danilo Krummrich
@ 2026-03-16 11:44       ` Eliot Courtney
  2026-03-16 12:21         ` Danilo Krummrich
  0 siblings, 1 reply; 37+ messages in thread
From: Eliot Courtney @ 2026-03-16 11:44 UTC (permalink / raw)
  To: Danilo Krummrich, Eliot Courtney
  Cc: Alice Ryhl, Alexandre Courbot, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel, dri-devel

On Tue Mar 10, 2026 at 7:01 AM JST, Danilo Krummrich wrote:
> On Mon Mar 9, 2026 at 10:57 PM CET, Danilo Krummrich wrote:
>> On 2/27/2026 1:32 PM, Eliot Courtney wrote:
>>> Add general `flush_into_vec` function. Add `flush_into_kvvec`
>>> convenience wrapper alongside the existing `flush_into_kvec` function.
>>> This is generally useful but immediately used for e.g. holding RM
>>> control payloads, which can be large (~>=20 KiB).
>>
>> Why not just always use KVVec? It also seems that the KVec variant is not used?
>
> (Besides its single usage in GspSequence, which wouldn't hurt to be a KVVec.)
>
>> If there's no reason for having both, I'd also just call this into_vec().

I think always using KVVec should be fine, thanks!

For the naming, I think `read_to_vec` may be more conventional for this
-- `into_vec` implies consuming the object, but if we want to keep the
warning in `Cmdq::receive_msg` if not all the data is consumed we need
to take &mut self.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure
  2026-03-13 15:40   ` Danilo Krummrich
@ 2026-03-16 12:06     ` Eliot Courtney
  0 siblings, 0 replies; 37+ messages in thread
From: Eliot Courtney @ 2026-03-16 12:06 UTC (permalink / raw)
  To: Danilo Krummrich, Eliot Courtney
  Cc: Alice Ryhl, Alexandre Courbot, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel

On Sat Mar 14, 2026 at 12:40 AM JST, Danilo Krummrich wrote:
> On Fri Feb 27, 2026 at 1:32 PM CET, Eliot Courtney wrote:
>> +/// Command for sending an RM control message to the GSP.
>> +struct RmControl<'a> {
>> +    h_client: u32,
>> +    h_object: u32,
>> +    cmd: RmControlMsgFunction,
>> +    params: &'a [u8],
>> +}
>
> Please expand the documentation, especially the fields.

Will do. Will add an explanation of client/device/subdevice/object
plus fix all the names.

>> +/// Response from an RM control message.
>> +pub(crate) struct RmControlReply {
>> +    status: NvStatus,
>> +    params: KVVec<u8>,
>> +}
>> +
>> +impl MessageFromGsp for RmControlReply {
>> +    const FUNCTION: MsgFunction = MsgFunction::GspRmControl;
>> +    type Message = GspRmControl;
>> +    type InitError = Error;
>> +
>> +    fn read(
>> +        msg: &Self::Message,
>> +        sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
>> +    ) -> Result<Self, Self::InitError> {
>> +        Ok(RmControlReply {
>> +            status: msg.status(),
>> +            params: sbuffer.flush_into_kvvec(GFP_KERNEL)?,
>> +        })
>> +    }
>> +}
>> +
>> +/// Sends an RM control command, checks the reply status, and returns the raw parameter bytes.
>> +#[expect(dead_code)]
>> +fn send_rm_control(
>
> Why isn't this a method of Cmdq?

Because this can be fully implemented in terms of existing primitives
available on Cmdq - it doesn't need to know anything about Cmdq
internals (it's a protocol on top of the Cmdq, not at the same level
of abstraction), and it fits the existing pattern of adding helpers for
sending messages to/from GSP (e.g. `get_gsp_info`), so IMO it is nicer
to keep it out of Cmdq.

>
>> +    cmdq: &Cmdq,
>> +    bar: &Bar0,
>> +    h_client: u32,
>> +    h_object: u32,
>> +    cmd: RmControlMsgFunction,
>> +    params: &[u8],
>> +) -> Result<KVVec<u8>> {
>> +    let reply = cmdq.send_sync_command(bar, RmControl::new(h_client, h_object, cmd, params))?;
>
> Why not let the caller construct RmControl?

Yeah, that seems fine to me, as long as we don't make `RmControl`
public outside of this module (so each helper would construct it and
call `send_rm_control`). If `RmControl` is public then it could be
misused and sent directly via `Cmdq`, which makes it easier to forget
to check the reply status.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-03-16 11:44       ` Eliot Courtney
@ 2026-03-16 12:21         ` Danilo Krummrich
  2026-03-17  1:55           ` Alexandre Courbot
  0 siblings, 1 reply; 37+ messages in thread
From: Danilo Krummrich @ 2026-03-16 12:21 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Alice Ryhl, Alexandre Courbot, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel, dri-devel,
	Gary Guo

(Cc: Gary)

On Mon Mar 16, 2026 at 12:44 PM CET, Eliot Courtney wrote:
> On Tue Mar 10, 2026 at 7:01 AM JST, Danilo Krummrich wrote:
>> On Mon Mar 9, 2026 at 10:57 PM CET, Danilo Krummrich wrote:
>>> On 2/27/2026 1:32 PM, Eliot Courtney wrote:
>>>> Add general `flush_into_vec` function. Add `flush_into_kvvec`
>>>> convenience wrapper alongside the existing `flush_into_kvec` function.
>>>> This is generally useful but immediately used for e.g. holding RM
>>>> control payloads, which can be large (~>=20 KiB).
>>>
>>> Why not just always use KVVec? It also seems that the KVec variant is not used?
>>
>> (Besides its single usage in GspSequence, which wouldn't hurt to be a KVVec.)
>>
>>> If there's no reason for having both, I'd also just call this into_vec().
>
> I think always using KVVec should be fine, thanks!
>
> For the naming, I think `read_to_vec` may be more conventional for this
> -- `into_vec` implies consuming the object, but if we want to keep the
> warning in `Cmdq::receive_msg` if not all the data is consumed we need
> to take &mut self.

I had another look at this and especially how the SBuffer you refer to is used.
Unfortunately, the underlying code is broken.

driver_read_area() creates a reference to the whole DMA object, including the
area the GSP might concurrently write to. This is undefined behavior. See also
commit commit 0073a17b4666 ("gpu: nova-core: gsp: fix UB in DmaGspMem pointer
accessors"), where I fixed something similar.

Additionally, even if it would only create a reference to the part of the buffer
that can be considerd untouched by the GSP and hence suits for creating a
reference, driver_read_area() and all subsequent callers would still need to be
unsafe as they would need to promise to not keep the reference alive beyond GSP
accessing that memory region again.

(The situation is similar for driver_write_area().)

So, unfortunately, commit 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command
queue bindings and handling") seems broken in this aspect.

This needs to be fixed first, and I think we should probably create a copy in
driver_read_area() right away.

I don't want to merge any code that builds on top of this before we have sorted
this out.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-03-16 12:21         ` Danilo Krummrich
@ 2026-03-17  1:55           ` Alexandre Courbot
  2026-03-17 10:49             ` Danilo Krummrich
  0 siblings, 1 reply; 37+ messages in thread
From: Alexandre Courbot @ 2026-03-17  1:55 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Eliot Courtney, Alice Ryhl, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel, dri-devel,
	Gary Guo

On Mon Mar 16, 2026 at 9:21 PM JST, Danilo Krummrich wrote:
> (Cc: Gary)
>
> On Mon Mar 16, 2026 at 12:44 PM CET, Eliot Courtney wrote:
>> On Tue Mar 10, 2026 at 7:01 AM JST, Danilo Krummrich wrote:
>>> On Mon Mar 9, 2026 at 10:57 PM CET, Danilo Krummrich wrote:
>>>> On 2/27/2026 1:32 PM, Eliot Courtney wrote:
>>>>> Add general `flush_into_vec` function. Add `flush_into_kvvec`
>>>>> convenience wrapper alongside the existing `flush_into_kvec` function.
>>>>> This is generally useful but immediately used for e.g. holding RM
>>>>> control payloads, which can be large (~>=20 KiB).
>>>>
>>>> Why not just always use KVVec? It also seems that the KVec variant is not used?
>>>
>>> (Besides its single usage in GspSequence, which wouldn't hurt to be a KVVec.)
>>>
>>>> If there's no reason for having both, I'd also just call this into_vec().
>>
>> I think always using KVVec should be fine, thanks!
>>
>> For the naming, I think `read_to_vec` may be more conventional for this
>> -- `into_vec` implies consuming the object, but if we want to keep the
>> warning in `Cmdq::receive_msg` if not all the data is consumed we need
>> to take &mut self.
>
> I had another look at this and especially how the SBuffer you refer to is used.
> Unfortunately, the underlying code is broken.
>
> driver_read_area() creates a reference to the whole DMA object, including the
> area the GSP might concurrently write to. This is undefined behavior. See also
> commit commit 0073a17b4666 ("gpu: nova-core: gsp: fix UB in DmaGspMem pointer
> accessors"), where I fixed something similar.

We shouldn't be doing that - I think we are limited by the current
CoherentAllocation API though. But IIUC this is something that I/O
projections will allow us to handle properly?

>
> Additionally, even if it would only create a reference to the part of the buffer
> that can be considerd untouched by the GSP and hence suits for creating a
> reference, driver_read_area() and all subsequent callers would still need to be
> unsafe as they would need to promise to not keep the reference alive beyond GSP
> accessing that memory region again.

This is guaranteed by the inability to update the CPU read pointer for
as long as the slices exists.

To expand a bit: `driver_read_area` returns a slice to the area of the
DMA object that the GSP is guaranteed *not* to write into until the
driver updates the CPU read pointer.

This area is between the CPU read pointer (which signals the next bytes
the CPU has to read, and which the GSP won't cross) and the GSP write
pointer (i.e. the next page to be written by the GSP).

Everything in this zone is data that the GSP has already written but the
driver hasn't read yet at the time of the call.

The CPU read pointer cannot be updated for as long as the returned
slices exist - the slices hold a reference to the `DmaGspMem`, and
updating the read pointer requires a mutable reference to the same
`DmaGspMem`.

Meanwhile, the GSP can keep writing data while the slice exists but that
data will be past the area of the slice, and the GSP will never write
past the CPU read pointer.

So the data in the returned slices is guaranteed to be there at the time
of the call, and immutable for as long as the slices exist. Thus, they
can be provided by a safe method.

Unless we decide to not trust the GSP, but that would be opening a whole
new can of worms.

> I don't want to merge any code that builds on top of this before we have sorted
> this out.

If what I have written above is correct, then the fix should simply be
to use I/O projections to create properly-bounded references. Any more
immediate fix would need to be much more intrusive and require a
refactoring that is imho more risky than carrying on for a bit with the
current behavior.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-03-17  1:55           ` Alexandre Courbot
@ 2026-03-17 10:49             ` Danilo Krummrich
  2026-03-17 13:41               ` Alexandre Courbot
  0 siblings, 1 reply; 37+ messages in thread
From: Danilo Krummrich @ 2026-03-17 10:49 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Eliot Courtney, Alice Ryhl, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel, dri-devel,
	Gary Guo

On Tue Mar 17, 2026 at 2:55 AM CET, Alexandre Courbot wrote:
> We shouldn't be doing that - I think we are limited by the current
> CoherentAllocation API though. But IIUC this is something that I/O
> projections will allow us to handle properly?

Why do we need projections to avoid UB here? driver_read_area() already even
peeks into the firmware abstraction layer, which is where MsgqData technically
belongs into (despite being trivial).

	let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
	let data = &gsp_mem.gspq.msgq.data;

Why do we need I/O projections to do raw pointer arithmetic where creating a
reference is UB?

(Eventually, we want to use IoView of course, as this is a textbook example of
what I proposed IoSlice for.)

Another option in the meantime would be / have been to use dma_read!() and
extract (copy) the data right away in driver_read_area(), which I'd probably
prefer over raw pointer arithmetic.

But in any case, this can (and should) be fixed even without IoView.

Besides that, nothing prevents us doing the same thing I did for gsp_write_ptr()
in the meantime to not break out of the firmware abstraction layer.

> This is guaranteed by the inability to update the CPU read pointer for
> as long as the slices exists.

Fair enough.

> Unless we decide to not trust the GSP, but that would be opening a whole
> new can of worms.

I thought about this as well, and I think it's fine. The safety comment within
the function has to justify why the device won't access the memory. If the
device does so regardless, it's simply a bug.

>> I don't want to merge any code that builds on top of this before we have sorted
>> this out.
>
> If what I have written above is correct, then the fix should simply be
> to use I/O projections to create properly-bounded references.

I still don't think we need I/O projections for a reasonable fix and I also
don't agree that we should keep UB until new features land.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-03-17 10:49             ` Danilo Krummrich
@ 2026-03-17 13:41               ` Alexandre Courbot
  2026-03-17 14:12                 ` Danilo Krummrich
  0 siblings, 1 reply; 37+ messages in thread
From: Alexandre Courbot @ 2026-03-17 13:41 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Eliot Courtney, Alice Ryhl, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel, dri-devel,
	Gary Guo

On Tue Mar 17, 2026 at 7:49 PM JST, Danilo Krummrich wrote:
> On Tue Mar 17, 2026 at 2:55 AM CET, Alexandre Courbot wrote:
>> We shouldn't be doing that - I think we are limited by the current
>> CoherentAllocation API though. But IIUC this is something that I/O
>> projections will allow us to handle properly?
>
> Why do we need projections to avoid UB here? driver_read_area() already even
> peeks into the firmware abstraction layer, which is where MsgqData technically
> belongs into (despite being trivial).
>
> 	let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
> 	let data = &gsp_mem.gspq.msgq.data;
>
> Why do we need I/O projections to do raw pointer arithmetic where creating a
> reference is UB?
>
> (Eventually, we want to use IoView of course, as this is a textbook example of
> what I proposed IoSlice for.)

Limiting the amount of `unsafe`s, but I guess we can live with that as
this is going to be short-term anyway.

>
> Another option in the meantime would be / have been to use dma_read!() and
> extract (copy) the data right away in driver_read_area(), which I'd probably
> prefer over raw pointer arithmetic.

I'd personally like to keep the current "no-copy" approach as it
implements the right reference discipline (i.e. you need a mutable
reference to update the read pointer, which cannot be done if the buffer
is read by the driver) and moving to copy semantics would open a window
of opportunity to mess with that balance further (on top of requiring
bigger code changes that will be temporary).

>
> But in any case, this can (and should) be fixed even without IoView.
>
> Besides that, nothing prevents us doing the same thing I did for gsp_write_ptr()
> in the meantime to not break out of the firmware abstraction layer.
>
>> This is guaranteed by the inability to update the CPU read pointer for
>> as long as the slices exists.
>
> Fair enough.
>
>> Unless we decide to not trust the GSP, but that would be opening a whole
>> new can of worms.
>
> I thought about this as well, and I think it's fine. The safety comment within
> the function has to justify why the device won't access the memory. If the
> device does so regardless, it's simply a bug.
>
>>> I don't want to merge any code that builds on top of this before we have sorted
>>> this out.
>>
>> If what I have written above is correct, then the fix should simply be
>> to use I/O projections to create properly-bounded references.
>
> I still don't think we need I/O projections for a reasonable fix and I also
> don't agree that we should keep UB until new features land.

I have the following (modulo missing safety comments) to fix
`driver_read_area` - does it look acceptable to you? If so I'll go
ahead and fix `driver_write_area` as well.

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index efa1aab1568f..3bddb5a2923f 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -296,24 +296,53 @@ fn driver_write_area_size(&self) -> usize {
         let tx = self.gsp_write_ptr() as usize;
         let rx = self.cpu_read_ptr() as usize;

+        // Pointer to the start of the GSP message queue.
+        //
         // SAFETY:
-        // - The `CoherentAllocation` contains exactly one object.
-        // - We will only access the driver-owned part of the shared memory.
-        // - Per the safety statement of the function, no concurrent access will be performed.
-        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
-        let data = &gsp_mem.gspq.msgq.data;
+        // - `self.0` contains exactly one element.
+        // - `gspq.msgq.data[0]` is within the bounds of that element.
+        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
+
+        // Safety/Panic comments to be referenced by the code below.
+        //
+        // SAFETY[1]:
+        // - `data` contains `MSGQ_NUM_PAGES` elements.
+        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
+        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
+        //   the GSP.
+        //
+        // PANIC[1]:
+        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
+        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.

-        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
-        // belongs to the driver for reading.
-        // PANIC:
-        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
-        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
         if rx <= tx {
             // The area is contiguous.
-            (&data[rx..tx], &[])
+            (
+                // SAFETY: See SAFETY[1].
+                //
+                // PANIC:
+                // - See PANIC[1].
+                // - Per the branch test, `rx <= tx`.
+                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
+                &[],
+            )
         } else {
             // The area is discontiguous.
-            (&data[rx..], &data[..tx])
+            (
+                // SAFETY: See SAFETY[1].
+                //
+                // PANIC: See PANIC[1].
+                unsafe {
+                    core::slice::from_raw_parts(
+                        data.add(rx),
+                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
+                    )
+                },
+                // SAFETY: See SAFETY[1].
+                //
+                // PANIC: See PANIC[1].
+                unsafe { core::slice::from_raw_parts(data, tx) },
+            )
         }
     }

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-03-17 13:41               ` Alexandre Courbot
@ 2026-03-17 14:12                 ` Danilo Krummrich
  2026-03-18  1:52                   ` Alexandre Courbot
  0 siblings, 1 reply; 37+ messages in thread
From: Danilo Krummrich @ 2026-03-17 14:12 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Eliot Courtney, Alice Ryhl, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel, dri-devel,
	Gary Guo

On Tue Mar 17, 2026 at 2:41 PM CET, Alexandre Courbot wrote:
> On Tue Mar 17, 2026 at 7:49 PM JST, Danilo Krummrich wrote:
>> On Tue Mar 17, 2026 at 2:55 AM CET, Alexandre Courbot wrote:
>>> We shouldn't be doing that - I think we are limited by the current
>>> CoherentAllocation API though. But IIUC this is something that I/O
>>> projections will allow us to handle properly?
>>
>> Why do we need projections to avoid UB here? driver_read_area() already even
>> peeks into the firmware abstraction layer, which is where MsgqData technically
>> belongs into (despite being trivial).
>>
>> 	let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
>> 	let data = &gsp_mem.gspq.msgq.data;
>>
>> Why do we need I/O projections to do raw pointer arithmetic where creating a
>> reference is UB?
>>
>> (Eventually, we want to use IoView of course, as this is a textbook example of
>> what I proposed IoSlice for.)
>
> Limiting the amount of `unsafe`s, but I guess we can live with that as
> this is going to be short-term anyway.

Of course it is going to be better with IoSlice, but limiting the number of
unsafe calls regardless is a bit pointless if the "safe" ones can cause
undefined behavior. :)

>> Another option in the meantime would be / have been to use dma_read!() and
>> extract (copy) the data right away in driver_read_area(), which I'd probably
>> prefer over raw pointer arithmetic.
>
> I'd personally like to keep the current "no-copy" approach as it
> implements the right reference discipline (i.e. you need a mutable
> reference to update the read pointer, which cannot be done if the buffer
> is read by the driver) and moving to copy semantics would open a window
> of opportunity to mess with that balance further (on top of requiring
> bigger code changes that will be temporary).

I don't even know if we want them to be temporary, i.e. we can copy right away
and IoSlice would still be an improvement in order to make the copy in the first
place.

Also, you say "no-copy", but that's not true, we do copy eventually. In fact,
the whole point of this patch is to copy this buffer into a KVVec.

So, why not copy it right away with dma_read!() (later replaced with an IoSlice
copy) and then process it further?

I am also very sceptical of the "holding on to the reference prevents the read
pointer update" argument. Once we have a copy, there is no need not to update
the read pointer anymore in the first place, no?

>> But in any case, this can (and should) be fixed even without IoView.
>>
>> Besides that, nothing prevents us doing the same thing I did for gsp_write_ptr()
>> in the meantime to not break out of the firmware abstraction layer.
>>
>>> This is guaranteed by the inability to update the CPU read pointer for
>>> as long as the slices exists.
>>
>> Fair enough.
>>
>>> Unless we decide to not trust the GSP, but that would be opening a whole
>>> new can of worms.
>>
>> I thought about this as well, and I think it's fine. The safety comment within
>> the function has to justify why the device won't access the memory. If the
>> device does so regardless, it's simply a bug.
>>
>>>> I don't want to merge any code that builds on top of this before we have sorted
>>>> this out.
>>>
>>> If what I have written above is correct, then the fix should simply be
>>> to use I/O projections to create properly-bounded references.
>>
>> I still don't think we need I/O projections for a reasonable fix and I also
>> don't agree that we should keep UB until new features land.
>
> I have the following (modulo missing safety comments) to fix
> `driver_read_area` - does it look acceptable to you? If so I'll go
> ahead and fix `driver_write_area` as well.

Not pretty (which is of course not on you :), but looks correct.

I still feel like we should just copy right away, as mentioned above.

> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index efa1aab1568f..3bddb5a2923f 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -296,24 +296,53 @@ fn driver_write_area_size(&self) -> usize {
>          let tx = self.gsp_write_ptr() as usize;
>          let rx = self.cpu_read_ptr() as usize;
>
> +        // Pointer to the start of the GSP message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
> -        let data = &gsp_mem.gspq.msgq.data;
> +        // - `self.0` contains exactly one element.
> +        // - `gspq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
> +
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` contains `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
> +        //   the GSP.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>
> -        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for reading.
> -        // PANIC:
> -        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
> -        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
>          if rx <= tx {
>              // The area is contiguous.
> -            (&data[rx..tx], &[])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                //
> +                // PANIC:
> +                // - See PANIC[1].
> +                // - Per the branch test, `rx <= tx`.
> +                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
> +                &[],
> +            )
>          } else {
>              // The area is discontiguous.
> -            (&data[rx..], &data[..tx])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                //
> +                // PANIC: See PANIC[1].
> +                unsafe {
> +                    core::slice::from_raw_parts(
> +                        data.add(rx),
> +                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
> +                    )
> +                },
> +                // SAFETY: See SAFETY[1].
> +                //
> +                // PANIC: See PANIC[1].
> +                unsafe { core::slice::from_raw_parts(data, tx) },
> +            )
>          }
>      }


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`
  2026-03-17 14:12                 ` Danilo Krummrich
@ 2026-03-18  1:52                   ` Alexandre Courbot
  0 siblings, 0 replies; 37+ messages in thread
From: Alexandre Courbot @ 2026-03-18  1:52 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Eliot Courtney, Alice Ryhl, David Airlie, Simona Vetter,
	rust-for-linux, nouveau, dri-devel, linux-kernel, dri-devel,
	Gary Guo

On Tue Mar 17, 2026 at 11:12 PM JST, Danilo Krummrich wrote:
> On Tue Mar 17, 2026 at 2:41 PM CET, Alexandre Courbot wrote:
>> On Tue Mar 17, 2026 at 7:49 PM JST, Danilo Krummrich wrote:
>>> On Tue Mar 17, 2026 at 2:55 AM CET, Alexandre Courbot wrote:
>>>> We shouldn't be doing that - I think we are limited by the current
>>>> CoherentAllocation API though. But IIUC this is something that I/O
>>>> projections will allow us to handle properly?
>>>
>>> Why do we need projections to avoid UB here? driver_read_area() already even
>>> peeks into the firmware abstraction layer, which is where MsgqData technically
>>> belongs into (despite being trivial).
>>>
>>> 	let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
>>> 	let data = &gsp_mem.gspq.msgq.data;
>>>
>>> Why do we need I/O projections to do raw pointer arithmetic where creating a
>>> reference is UB?
>>>
>>> (Eventually, we want to use IoView of course, as this is a textbook example of
>>> what I proposed IoSlice for.)
>>
>> Limiting the amount of `unsafe`s, but I guess we can live with that as
>> this is going to be short-term anyway.
>
> Of course it is going to be better with IoSlice, but limiting the number of
> unsafe calls regardless is a bit pointless if the "safe" ones can cause
> undefined behavior. :)

No arguing. :)

>
>>> Another option in the meantime would be / have been to use dma_read!() and
>>> extract (copy) the data right away in driver_read_area(), which I'd probably
>>> prefer over raw pointer arithmetic.
>>
>> I'd personally like to keep the current "no-copy" approach as it
>> implements the right reference discipline (i.e. you need a mutable
>> reference to update the read pointer, which cannot be done if the buffer
>> is read by the driver) and moving to copy semantics would open a window
>> of opportunity to mess with that balance further (on top of requiring
>> bigger code changes that will be temporary).
>
> I don't even know if we want them to be temporary, i.e. we can copy right away
> and IoSlice would still be an improvement in order to make the copy in the first
> place.
>
> Also, you say "no-copy", but that's not true, we do copy eventually. In fact,
> the whole point of this patch is to copy this buffer into a KVVec.
>
> So, why not copy it right away with dma_read!() (later replaced with an IoSlice
> copy) and then process it further?

So as you said we are already making one copy: `receive_msg` for
instance does not return any reference to the command buffer, but an
object created from the slices returned by `driver_read_area`. What you
are suggesting would introduce another copy in the command queue
internals, for no tangible benefit as it would not allow us to
advance the read pointer sooner anyway.

Because even if we did that extra copy using `dma_read`, we would still
need to make sure the read pointer doesn't move as we are doing it, lest
we return corrupted messages - so the situation would be more or less
the same, and the `&mut self` required to advance the read pointer would
still be needed to shield us from doing it inadvertently.

In addition, we need space to hold that additional copy. It doesn't have
a fixed size, and can cover several memory pages. Does that mean we
would need to allocate a KVec or KVVec for every message received, and
drop it even before `receive_msg` returns?

The additional code to do that allocation would likely be larger than
the fix I proposed below, and we know it is very temporary and will be
replaced by mostly safe code soon. I would understand the trade-off of
an extra copy if this resulted in simpler code, but I don't think that
would be the case here.

>
> I am also very sceptical of the "holding on to the reference prevents the read
> pointer update" argument. Once we have a copy, there is no need not to update
> the read pointer anymore in the first place, no?

Correct, and that's exactly what `receive_msg` does - as soon as it has
created the object to return from the slices, it advances the read
pointer. So the read pointer is already updated when `receive_msg`
returns, and that "holding on to the reference" thing is purely for
internal safety.

Introducing another copy won't allow us to update it sooner - copying
the bytes into a KVec is not faster than processing the slices' data
into the final returned value. Actually it may even be slower if that
involves a dynamic allocation.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2026-03-18  1:52 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27 12:32 [PATCH 0/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
2026-02-27 12:32 ` [PATCH 1/9] gpu: nova-core: gsp: add NV_STATUS error code bindings Eliot Courtney
2026-02-27 12:32 ` [PATCH 2/9] gpu: nova-core: gsp: add NvStatus enum for RM control errors Eliot Courtney
2026-02-27 12:32 ` [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles Eliot Courtney
2026-03-09 21:22   ` Joel Fernandes
2026-03-09 23:41     ` Joel Fernandes
2026-03-10  0:06       ` John Hubbard
2026-03-10  2:17         ` Joel Fernandes
2026-03-10  2:29           ` John Hubbard
2026-03-10 18:48             ` Joel Fernandes
2026-03-10  2:36           ` Alexandre Courbot
2026-03-10  4:02             ` Eliot Courtney
2026-03-10 10:35               ` Danilo Krummrich
2026-02-27 12:32 ` [PATCH 4/9] gpu: nova-core: gsp: add RM control RPC structure binding Eliot Courtney
2026-02-27 12:32 ` [PATCH 5/9] gpu: nova-core: gsp: add types for RM control RPCs Eliot Courtney
2026-03-09 21:45   ` Joel Fernandes
2026-03-16 11:42     ` Eliot Courtney
2026-02-27 12:32 ` [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec` Eliot Courtney
2026-03-09 21:53   ` Joel Fernandes
2026-03-09 21:57   ` Danilo Krummrich
2026-03-09 22:01     ` Danilo Krummrich
2026-03-16 11:44       ` Eliot Courtney
2026-03-16 12:21         ` Danilo Krummrich
2026-03-17  1:55           ` Alexandre Courbot
2026-03-17 10:49             ` Danilo Krummrich
2026-03-17 13:41               ` Alexandre Courbot
2026-03-17 14:12                 ` Danilo Krummrich
2026-03-18  1:52                   ` Alexandre Courbot
2026-02-27 12:32 ` [PATCH 7/9] gpu: nova-core: gsp: add RM control command infrastructure Eliot Courtney
2026-03-02  8:00   ` Zhi Wang
2026-03-09 22:08   ` Joel Fernandes
2026-03-13 15:40   ` Danilo Krummrich
2026-03-16 12:06     ` Eliot Courtney
2026-02-27 12:32 ` [PATCH 8/9] gpu: nova-core: gsp: add CE fault method buffer size bindings Eliot Courtney
2026-03-09 22:08   ` Joel Fernandes
2026-02-27 12:32 ` [PATCH 9/9] gpu: nova-core: gsp: add CeGetFaultMethodBufferSize RM control command Eliot Courtney
2026-03-09 22:23   ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox