dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/nouveau: ZCULL support
@ 2025-03-12 21:36 Mel Henning
  2025-03-12 21:36 ` [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO Mel Henning
  2025-03-12 21:36 ` [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER Mel Henning
  0 siblings, 2 replies; 21+ messages in thread
From: Mel Henning @ 2025-03-12 21:36 UTC (permalink / raw)
  To: Karol Herbst, Lyude Paul, Danilo Krummrich, Faith Ekstrand,
	dri-devel, nouveau
  Cc: Mel Henning

These patches add kernel-side support for using the zcull hardware in nvidia
gpus. Zcull aims to improve memory bandwidth by using an early approximate
depth test, similar to hierarchical Z on an AMD card. These patches add two
new ioctls on nouveau devices, which are currently only supported when using
gsp firmware.

Corresponding userspace changes for NVK are available here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33861
or, if gitlab is down, mirrored here:
https://codeberg.org/mhenning/mesa/commits/branch/zcull3

This series is also available in git:
https://gitlab.freedesktop.org/mhenning/linux/-/commits/zcull2?ref_type=heads

As these are my first kernel patches, I ask for your patience during the
review process.

Mel Henning (2):
  drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER

 drivers/gpu/drm/nouveau/include/nvif/if0020.h |   6 +
 .../drm/nouveau/include/nvkm/core/device.h    |   6 +
 .../gpu/drm/nouveau/include/nvkm/engine/gr.h  |   1 +
 .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h | 122 ++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_abi16.c       |  61 +++++++++
 drivers/gpu/drm/nouveau/nouveau_abi16.h       |   2 +
 drivers/gpu/drm/nouveau/nouveau_drm.c         |   2 +
 .../gpu/drm/nouveau/nvkm/engine/fifo/chan.c   |   8 ++
 .../gpu/drm/nouveau/nvkm/engine/fifo/chan.h   |   2 +
 .../gpu/drm/nouveau/nvkm/engine/fifo/r535.c   |  26 ++++
 .../gpu/drm/nouveau/nvkm/engine/fifo/uchan.c  |  17 +++
 drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c |  34 +++++
 include/uapi/drm/nouveau_drm.h                |  35 +++++
 13 files changed, 322 insertions(+)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-12 21:36 [PATCH 0/2] drm/nouveau: ZCULL support Mel Henning
@ 2025-03-12 21:36 ` Mel Henning
  2025-03-20 18:18   ` Danilo Krummrich
  2025-03-12 21:36 ` [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER Mel Henning
  1 sibling, 1 reply; 21+ messages in thread
From: Mel Henning @ 2025-03-12 21:36 UTC (permalink / raw)
  To: Karol Herbst, Lyude Paul, Danilo Krummrich, Faith Ekstrand,
	dri-devel, nouveau
  Cc: Mel Henning

Userspace needs this information to set up zcull correctly.

Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
---
 .../drm/nouveau/include/nvkm/core/device.h    |  6 ++
 .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h | 69 +++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_abi16.c       | 15 ++++
 drivers/gpu/drm/nouveau/nouveau_abi16.h       |  1 +
 drivers/gpu/drm/nouveau/nouveau_drm.c         |  1 +
 drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c | 34 +++++++++
 include/uapi/drm/nouveau_drm.h                | 19 +++++
 7 files changed, 145 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/device.h b/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
index 46afb877a296..d1742ff1cf6b 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
@@ -3,6 +3,9 @@
 #define __NVKM_DEVICE_H__
 #include <core/oclass.h>
 #include <core/intr.h>
+
+#include "uapi/drm/nouveau_drm.h"
+
 enum nvkm_subdev_type;
 
 enum nvkm_device_type {
@@ -72,6 +75,9 @@ struct nvkm_device {
 		bool armed;
 		bool legacy_done;
 	} intr;
+
+	bool has_zcull_info;
+	struct drm_nouveau_get_zcull_info zcull_info;
 };
 
 struct nvkm_subdev *nvkm_device_subdev(struct nvkm_device *, int type, int inst);
diff --git a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
index 59f8895bc5d7..01884b926c9c 100644
--- a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
+++ b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
@@ -26,6 +26,75 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
+/**
+ * NV2080_CTRL_CMD_GR_GET_ZCULL_INFO
+ *
+ * This command is used to query the RM for zcull information that the
+ * driver will need to allocate and manage the zcull regions.
+ *
+ *   widthAlignPixels
+ *     This parameter returns the width alignment restrictions in pixels
+ *     used to adjust a surface for proper aliquot coverage (typically
+ *     #TPC's * 16).
+ *
+ *   heightAlignPixels
+ *     This parameter returns the height alignment restrictions in pixels
+ *     used to adjust a surface for proper aliquot coverage (typically 32).
+ *
+ *   pixelSquaresByAliquots
+ *     This parameter returns the pixel area covered by an aliquot
+ *     (typically #Zcull_banks * 16 * 16).
+ *
+ *   aliquotTotal
+ *     This parameter returns the total aliquot pool available in HW.
+ *
+ *   zcullRegionByteMultiplier
+ *     This parameter returns multiplier used to convert aliquots in a region
+ *     to the number of bytes required to save/restore them.
+ *
+ *   zcullRegionHeaderSize
+ *     This parameter returns the region header size which is required to be
+ *     allocated and accounted for in any save/restore operation on a region.
+ *
+ *   zcullSubregionHeaderSize
+ *     This parameter returns the subregion header size which is required to be
+ *     allocated and accounted for in any save/restore operation on a region.
+ *
+ *   subregionCount
+ *     This parameter returns the subregion count.
+ *
+ *   subregionWidthAlignPixels
+ *     This parameter returns the subregion width alignment restrictions in
+ *     pixels used to adjust a surface for proper aliquot coverage
+ *     (typically #TPC's * 16).
+ *
+ *   subregionHeightAlignPixels
+ *     This parameter returns the subregion height alignment restrictions in
+ *     pixels used to adjust a surface for proper aliquot coverage
+ *     (typically 62).
+ *
+ *   The callee should compute the size of a zcull region as follows.
+ *     (numBytes = aliquots * zcullRegionByteMultiplier +
+ *                 zcullRegionHeaderSize + zcullSubregionHeaderSize)
+ */
+#define NV2080_CTRL_CMD_GR_GET_ZCULL_INFO            (0x20801206U) /* finn: Evaluated from "(FINN_NV20_SUBDEVICE_0_GR_INTERFACE_ID << 8) | NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_MESSAGE_ID" */
+
+#define NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_SUBREGION_SUPPORTED
+#define NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_MESSAGE_ID (0x6U)
+
+typedef struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS {
+    NvU32 widthAlignPixels;
+    NvU32 heightAlignPixels;
+    NvU32 pixelSquaresByAliquots;
+    NvU32 aliquotTotal;
+    NvU32 zcullRegionByteMultiplier;
+    NvU32 zcullRegionHeaderSize;
+    NvU32 zcullSubregionHeaderSize;
+    NvU32 subregionCount;
+    NvU32 subregionWidthAlignPixels;
+    NvU32 subregionHeightAlignPixels;
+} NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS;
+
 typedef enum NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS {
     NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_MAIN = 0,
     NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_SPILL = 1,
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index 2a0617e5fe2a..981abea97546 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -333,6 +333,21 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
 	return 0;
 }
 
+int
+nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS)
+{
+	struct nouveau_drm *drm = nouveau_drm(dev);
+	struct nvkm_device *device = drm->nvkm;
+	struct drm_nouveau_get_zcull_info *out = data;
+
+	if (device->has_zcull_info) {
+		*out = device->zcull_info;
+		return 0;
+	} else {
+		return -ENODEV;
+	}
+}
+
 int
 nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h
index af6b4e1cefd2..134b3ab58719 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
@@ -6,6 +6,7 @@
 	struct drm_device *dev, void *data, struct drm_file *file_priv
 
 int nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS);
+int nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS);
 int nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS);
 int nouveau_abi16_ioctl_channel_free(ABI16_IOCTL_ARGS);
 int nouveau_abi16_ioctl_grobj_alloc(ABI16_IOCTL_ARGS);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 107f63f08bd9..4c4168b50e60 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -1244,6 +1244,7 @@ nouveau_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GROBJ_ALLOC, nouveau_abi16_ioctl_grobj_alloc, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_NOTIFIEROBJ_ALLOC, nouveau_abi16_ioctl_notifierobj_alloc, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GPUOBJ_FREE, nouveau_abi16_ioctl_gpuobj_free, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(NOUVEAU_GET_ZCULL_INFO, nouveau_abi16_ioctl_get_zcull_info, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_INIT, nouveau_svmm_init, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_BIND, nouveau_svmm_bind, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_NEW, nouveau_gem_ioctl_new, DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c b/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
index f4bed3eb1ec2..6669f2b1f492 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
@@ -34,6 +34,7 @@
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/alloc/alloc_channel.h>
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl0080/ctrl0080fifo.h>
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h>
+#include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h>
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080internal.h>
 #include <nvrm/535.113.01/nvidia/generated/g_kernel_channel_nvoc.h>
 
@@ -244,6 +245,8 @@ static int
 r535_gr_oneinit(struct nvkm_gr *base)
 {
 	NV2080_CTRL_INTERNAL_STATIC_GR_GET_CONTEXT_BUFFERS_INFO_PARAMS *info;
+	NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS *zcull_info;
+	u32 zcull_ctxsw_size, zcull_ctxsw_align;
 	struct r535_gr *gr = container_of(base, typeof(*gr), base);
 	struct nvkm_subdev *subdev = &gr->base.engine.subdev;
 	struct nvkm_device *device = subdev->device;
@@ -418,6 +421,11 @@ r535_gr_oneinit(struct nvkm_gr *base)
 		}
 	}
 
+	zcull_ctxsw_size = info->engineContextBuffersInfo[0]
+		.engine[NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL].size;
+	zcull_ctxsw_align = info->engineContextBuffersInfo[0]
+		.engine[NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL].alignment;
+
 	nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, info);
 
 	/* Promote golden context to RM. */
@@ -450,6 +458,32 @@ r535_gr_oneinit(struct nvkm_gr *base)
 		}
 	}
 
+	zcull_info = nvkm_gsp_rm_ctrl_rd(&gsp->internal.device.subdevice,
+					 NV2080_CTRL_CMD_GR_GET_ZCULL_INFO,
+					 sizeof(*zcull_info));
+	if (WARN_ON(IS_ERR(zcull_info))) {
+		ret = PTR_ERR(zcull_info);
+		goto done;
+	}
+
+	device->has_zcull_info = true;
+
+	device->zcull_info.width_align_pixels = zcull_info->widthAlignPixels;
+	device->zcull_info.height_align_pixels = zcull_info->heightAlignPixels;
+	device->zcull_info.pixel_squares_by_aliquots = zcull_info->pixelSquaresByAliquots;
+	device->zcull_info.aliquot_total = zcull_info->aliquotTotal;
+	device->zcull_info.zcull_region_byte_multiplier = zcull_info->zcullRegionByteMultiplier;
+	device->zcull_info.zcull_region_header_size = zcull_info->zcullRegionHeaderSize;
+	device->zcull_info.zcull_subregion_header_size = zcull_info->zcullSubregionHeaderSize;
+	device->zcull_info.subregion_count = zcull_info->subregionCount;
+	device->zcull_info.subregion_width_align_pixels = zcull_info->subregionWidthAlignPixels;
+	device->zcull_info.subregion_height_align_pixels = zcull_info->subregionHeightAlignPixels;
+
+	device->zcull_info.ctxsw_size = zcull_ctxsw_size;
+	device->zcull_info.ctxsw_align = zcull_ctxsw_align;
+
+	nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, zcull_info);
+
 done:
 	nvkm_gsp_rm_free(&golden.chan);
 	for (int i = gr->ctxbuf_nr - 1; i >= 0; i--)
diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
index dd87f8f30793..33361784eb4e 100644
--- a/include/uapi/drm/nouveau_drm.h
+++ b/include/uapi/drm/nouveau_drm.h
@@ -432,6 +432,22 @@ struct drm_nouveau_exec {
 	__u64 push_ptr;
 };
 
+struct drm_nouveau_get_zcull_info {
+	__u32 width_align_pixels;
+	__u32 height_align_pixels;
+	__u32 pixel_squares_by_aliquots;
+	__u32 aliquot_total;
+	__u32 zcull_region_byte_multiplier;
+	__u32 zcull_region_header_size;
+	__u32 zcull_subregion_header_size;
+	__u32 subregion_count;
+	__u32 subregion_width_align_pixels;
+	__u32 subregion_height_align_pixels;
+
+	__u32 ctxsw_size;
+	__u32 ctxsw_align;
+};
+
 #define DRM_NOUVEAU_GETPARAM           0x00
 #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
 #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
@@ -445,6 +461,7 @@ struct drm_nouveau_exec {
 #define DRM_NOUVEAU_VM_INIT            0x10
 #define DRM_NOUVEAU_VM_BIND            0x11
 #define DRM_NOUVEAU_EXEC               0x12
+#define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
 #define DRM_NOUVEAU_GEM_NEW            0x40
 #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
 #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
@@ -513,6 +530,8 @@ struct drm_nouveau_svm_bind {
 #define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
 #define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
 #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
+
+#define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
 #if defined(__cplusplus)
 }
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER
  2025-03-12 21:36 [PATCH 0/2] drm/nouveau: ZCULL support Mel Henning
  2025-03-12 21:36 ` [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO Mel Henning
@ 2025-03-12 21:36 ` Mel Henning
  2025-03-20 18:34   ` Danilo Krummrich
  1 sibling, 1 reply; 21+ messages in thread
From: Mel Henning @ 2025-03-12 21:36 UTC (permalink / raw)
  To: Karol Herbst, Lyude Paul, Danilo Krummrich, Faith Ekstrand,
	dri-devel, nouveau
  Cc: Mel Henning

We add this ioctl to set up the zcull context switch buffer so userspace
can manage it. This also includes support for unbinding the buffer,
which helps simplify device teardown on the userspace side.

Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
---
 drivers/gpu/drm/nouveau/include/nvif/if0020.h |  6 +++
 .../gpu/drm/nouveau/include/nvkm/engine/gr.h  |  1 +
 .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h | 53 +++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_abi16.c       | 46 ++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_abi16.h       |  1 +
 drivers/gpu/drm/nouveau/nouveau_drm.c         |  1 +
 .../gpu/drm/nouveau/nvkm/engine/fifo/chan.c   |  8 +++
 .../gpu/drm/nouveau/nvkm/engine/fifo/chan.h   |  2 +
 .../gpu/drm/nouveau/nvkm/engine/fifo/r535.c   | 26 +++++++++
 .../gpu/drm/nouveau/nvkm/engine/fifo/uchan.c  | 17 ++++++
 include/uapi/drm/nouveau_drm.h                | 16 ++++++
 11 files changed, 177 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/include/nvif/if0020.h b/drivers/gpu/drm/nouveau/include/nvif/if0020.h
index 085e0ae8a450..78be9689b3c5 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/if0020.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/if0020.h
@@ -42,4 +42,10 @@ union nvif_chan_event_args {
 		__u8 type;
 	} v0;
 };
+
+#define NVIF_CHAN_MTHD_SET_ZCULL_CTXSW_BUFFER 0x00000000
+
+struct nvif_chan_mthd_set_zcull_ctxsw_buffer {
+	u64 addr;
+};
 #endif
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/engine/gr.h b/drivers/gpu/drm/nouveau/include/nvkm/engine/gr.h
index 8145796ffc61..7646b0d9a9ed 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/engine/gr.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/engine/gr.h
@@ -9,6 +9,7 @@ struct nvkm_gr {
 };
 
 u64 nvkm_gr_units(struct nvkm_gr *);
+int nvkm_gr_set_zcull_ctxsw_buffer(struct nvkm_gr *gr, struct nvkm_chan *chan, u64 addr);
 int nvkm_gr_tlb_flush(struct nvkm_gr *);
 int nvkm_gr_ctxsw_pause(struct nvkm_device *);
 int nvkm_gr_ctxsw_resume(struct nvkm_device *);
diff --git a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
index 01884b926c9c..36fee8bf59be 100644
--- a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
+++ b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
@@ -26,6 +26,11 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
+/* valid zcullMode values */
+#define NV2080_CTRL_CTXSW_ZCULL_MODE_GLOBAL          (0x00000000U)
+#define NV2080_CTRL_CTXSW_ZCULL_MODE_NO_CTXSW        (0x00000001U)
+#define NV2080_CTRL_CTXSW_ZCULL_MODE_SEPARATE_BUFFER (0x00000002U)
+
 /**
  * NV2080_CTRL_CMD_GR_GET_ZCULL_INFO
  *
@@ -95,6 +100,54 @@ typedef struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS {
     NvU32 subregionHeightAlignPixels;
 } NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS;
 
+
+/*
+ * NV2080_CTRL_CMD_GR_CTXSW_ZCULL_BIND
+ *
+ * This command is used to set the zcull context switch mode and virtual address
+ * for the specified channel. A value of NV_ERR_NOT_SUPPORTED is
+ * returned if the target channel does not support zcull context switch mode
+ * changes.
+ *
+ *   hClient
+ *     This parameter specifies the client handle of
+ *     that owns the zcull context buffer. This field must match
+ *     the hClient used in the control call for non-kernel clients.
+ *   hChannel
+ *     This parameter specifies the channel handle of
+ *     the channel that is to have its zcull context switch mode changed.
+ *   vMemPtr
+ *     This parameter specifies the 64 bit virtual address
+ *     for the allocated zcull context buffer.
+ *   zcullMode
+ *     This parameter specifies the new zcull context switch mode.
+ *     Legal values for this parameter include:
+ *       NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_GLOBAL
+ *         This mode is the normal zcull operation where it is not
+ *         context switched and there is one set of globally shared
+ *         zcull memory and tables.  This mode is only supported as
+ *         long as all channels use this mode.
+ *       NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_NO_CTXSW
+ *         This mode causes the zcull tables to be reset on a context
+ *         switch, but the zcull buffer will not be saved/restored.
+ *       NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_SEPARATE_BUFFER
+ *         This mode will cause the zcull buffers and tables to be
+ *         saved/restored on context switches.  If a share channel
+ *         ID is given (shareChID), then the 2 channels will share
+ *         the zcull context buffers.
+ */
+#define NV2080_CTRL_CMD_GR_CTXSW_ZCULL_BIND        (0x20801208U) /* finn: Evaluated from "(FINN_NV20_SUBDEVICE_0_GR_INTERFACE_ID << 8) | NV2080_CTRL_GR_CTXSW_ZCULL_BIND_PARAMS_MESSAGE_ID" */
+
+#define NV2080_CTRL_GR_CTXSW_ZCULL_BIND_PARAMS_MESSAGE_ID (0x8U)
+
+typedef struct NV2080_CTRL_GR_CTXSW_ZCULL_BIND_PARAMS {
+    NvHandle hClient;
+    NvHandle hChannel;
+    NV_DECLARE_ALIGNED(NvU64 vMemPtr, 8);
+    NvU32    zcullMode;
+} NV2080_CTRL_GR_CTXSW_ZCULL_BIND_PARAMS;
+/* valid zcullMode values same as above NV2080_CTRL_CTXSW_ZCULL_MODE */
+
 typedef enum NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS {
     NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_MAIN = 0,
     NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_SPILL = 1,
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index 981abea97546..07f8215e61bd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -27,6 +27,7 @@
 #include <nvif/ioctl.h>
 #include <nvif/class.h>
 #include <nvif/cl0002.h>
+#include <nvif/if0020.h>
 #include <nvif/unpack.h>
 
 #include "nouveau_drv.h"
@@ -348,6 +349,51 @@ nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS)
 	}
 }
 
+int
+nouveau_abi16_ioctl_set_zcull_ctxsw_buffer(ABI16_IOCTL_ARGS)
+{
+	struct nouveau_abi16 *abi16 = nouveau_abi16_get(file_priv);
+	struct nouveau_abi16_chan *chan16;
+	struct nouveau_channel *chan = NULL;
+	struct nouveau_drm *drm = nouveau_drm(dev);
+	struct nvkm_device *device = drm->nvkm;
+	struct drm_nouveau_set_zcull_ctxsw_buffer *req = data;
+	int ret = 0;
+
+	if (unlikely(!device->has_zcull_info))
+		return nouveau_abi16_put(abi16, -ENODEV);
+
+	if (unlikely(req->pad))
+		return nouveau_abi16_put(abi16, -EINVAL);
+
+	if (unlikely(!abi16))
+		return nouveau_abi16_put(abi16, -ENOMEM);
+
+	/* abi16 locks already */
+	list_for_each_entry(chan16, &abi16->channels, head) {
+		if (chan16->chan->chid == req->channel) {
+			chan = chan16->chan;
+			break;
+		}
+	}
+
+	if (!chan)
+		return nouveau_abi16_put(abi16, -ENOENT);
+
+	if (unlikely(atomic_read(&chan->killed)))
+		return nouveau_abi16_put(abi16, -ENODEV);
+
+	struct nvif_chan_mthd_set_zcull_ctxsw_buffer args = {
+		.addr = req->addr,
+	};
+
+	ret = nvif_mthd(&chan->user, NVIF_CHAN_MTHD_SET_ZCULL_CTXSW_BUFFER,
+			&args, sizeof(args));
+
+	return nouveau_abi16_put(abi16, ret);
+}
+
+
 int
 nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h
index 134b3ab58719..2ac0842aee88 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
@@ -7,6 +7,7 @@
 
 int nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS);
 int nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS);
+int nouveau_abi16_ioctl_set_zcull_ctxsw_buffer(ABI16_IOCTL_ARGS);
 int nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS);
 int nouveau_abi16_ioctl_channel_free(ABI16_IOCTL_ARGS);
 int nouveau_abi16_ioctl_grobj_alloc(ABI16_IOCTL_ARGS);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 4c4168b50e60..352620023a01 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -1245,6 +1245,7 @@ nouveau_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(NOUVEAU_NOTIFIEROBJ_ALLOC, nouveau_abi16_ioctl_notifierobj_alloc, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GPUOBJ_FREE, nouveau_abi16_ioctl_gpuobj_free, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GET_ZCULL_INFO, nouveau_abi16_ioctl_get_zcull_info, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(NOUVEAU_SET_ZCULL_CTXSW_BUFFER, nouveau_abi16_ioctl_set_zcull_ctxsw_buffer, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_INIT, nouveau_svmm_init, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_BIND, nouveau_svmm_bind, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_NEW, nouveau_gem_ioctl_new, DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
index 7d4716dcd512..0460cc3286d9 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.c
@@ -345,6 +345,14 @@ nvkm_chan_get_chid(struct nvkm_engine *engine, int id, unsigned long *pirqflags)
 	return NULL;
 }
 
+int
+nvkm_chan_set_zcull_ctxsw_buffer(struct nvkm_chan *chan, u64 addr)
+{
+	if (chan->func->set_zcull_ctxsw_buffer)
+		return chan->func->set_zcull_ctxsw_buffer(chan, addr);
+	return -ENODEV;
+}
+
 int
 nvkm_chan_new_(const struct nvkm_chan_func *func, struct nvkm_runl *runl, int runq,
 	       struct nvkm_cgrp *cgrp, const char *name, bool priv, u32 devm, struct nvkm_vmm *vmm,
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.h b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.h
index 013682a709d5..5c9380d39468 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.h
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/chan.h
@@ -54,6 +54,7 @@ struct nvkm_chan_func {
 	void (*stop)(struct nvkm_chan *);
 	void (*preempt)(struct nvkm_chan *);
 	u32 (*doorbell_handle)(struct nvkm_chan *);
+	int (*set_zcull_ctxsw_buffer)(struct nvkm_chan *, u64 addr);
 };
 
 int nvkm_chan_new_(const struct nvkm_chan_func *, struct nvkm_runl *, int runq, struct nvkm_cgrp *,
@@ -73,6 +74,7 @@ int nvkm_chan_cctx_get(struct nvkm_chan *, struct nvkm_engn *, struct nvkm_cctx
 		       struct nvkm_client * /*TODO: remove need for this */);
 void nvkm_chan_cctx_put(struct nvkm_chan *, struct nvkm_cctx **);
 void nvkm_chan_cctx_bind(struct nvkm_chan *, struct nvkm_engn *, struct nvkm_cctx *);
+int nvkm_chan_set_zcull_ctxsw_buffer(struct nvkm_chan *chan, u64 addr);
 
 #define CHAN_PRCLI(c,l,p,f,a...) CGRP_PRINT((c)->cgrp, l, p, "%04x:[%s]"f, (c)->id, (c)->name, ##a)
 #define CHAN_PRINT(c,l,p,f,a...) CGRP_PRINT((c)->cgrp, l, p, "%04x:"f, (c)->id, ##a)
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
index 3454c7d29502..11d5c5769f59 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c
@@ -39,6 +39,7 @@
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080ce.h>
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080fifo.h>
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h>
+#include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h>
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080internal.h>
 #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrla06f/ctrla06fgpfifo.h>
 #include <nvrm/535.113.01/nvidia/generated/g_kernel_channel_nvoc.h>
@@ -328,6 +329,30 @@ r535_chan_id_get(struct nvkm_chan *chan, struct nvkm_memory *muserd, u64 ouserd)
 	return ret;
 }
 
+static int
+r535_chan_set_zcull_ctxsw_buffer(struct nvkm_chan *chan, u64 addr)
+{
+	NV2080_CTRL_GR_CTXSW_ZCULL_BIND_PARAMS *ctrl;
+
+	ctrl = nvkm_gsp_rm_ctrl_get(&chan->vmm->rm.device.subdevice,
+		NV2080_CTRL_CMD_GR_CTXSW_ZCULL_BIND, sizeof(*ctrl));
+
+	if (WARN_ON(IS_ERR(ctrl)))
+		return PTR_ERR(ctrl);
+
+	ctrl->hClient = chan->vmm->rm.client.object.handle;
+	ctrl->hChannel = chan->rm.object.handle;
+	if (addr) {
+		ctrl->vMemPtr = addr;
+		ctrl->zcullMode = NV2080_CTRL_CTXSW_ZCULL_MODE_SEPARATE_BUFFER;
+	} else {
+		ctrl->vMemPtr = 0;
+		ctrl->zcullMode = NV2080_CTRL_CTXSW_ZCULL_MODE_NO_CTXSW;
+	}
+
+	return nvkm_gsp_rm_ctrl_wr(&chan->vmm->rm.device.subdevice, ctrl);
+}
+
 static const struct nvkm_chan_func
 r535_chan = {
 	.id_get = r535_chan_id_get,
@@ -338,6 +363,7 @@ r535_chan = {
 	.start = r535_chan_start,
 	.stop = r535_chan_stop,
 	.doorbell_handle = r535_chan_doorbell_handle,
+	.set_zcull_ctxsw_buffer = r535_chan_set_zcull_ctxsw_buffer,
 };
 
 static const struct nvkm_cgrp_func
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/uchan.c b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/uchan.c
index 9e56bcc166ed..44c18a77cf10 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/fifo/uchan.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/fifo/uchan.c
@@ -31,6 +31,7 @@
 #include <subdev/mmu.h>
 #include <engine/dma.h>
 
+#include <nvkm/engine/gr.h>
 #include <nvif/if0020.h>
 
 struct nvkm_uchan {
@@ -268,6 +269,21 @@ nvkm_uchan_map(struct nvkm_object *object, void *argv, u32 argc,
 	return 0;
 }
 
+static int
+nvkm_uchan_mthd(struct nvkm_object *object, u32 mthd, void *argv, u32 argc)
+{
+	struct nvkm_chan *chan = nvkm_uchan(object)->chan;
+	struct nvif_chan_mthd_set_zcull_ctxsw_buffer *args = argv;
+
+	if (mthd != NVIF_CHAN_MTHD_SET_ZCULL_CTXSW_BUFFER)
+		return -ENOTTY;
+
+	if (argc < sizeof(*args))
+		return -EINVAL;
+
+	return nvkm_chan_set_zcull_ctxsw_buffer(chan, args->addr);
+}
+
 static int
 nvkm_uchan_fini(struct nvkm_object *object, bool suspend)
 {
@@ -312,6 +328,7 @@ nvkm_uchan = {
 	.dtor = nvkm_uchan_dtor,
 	.init = nvkm_uchan_init,
 	.fini = nvkm_uchan_fini,
+	.mthd = nvkm_uchan_mthd,
 	.map = nvkm_uchan_map,
 	.sclass = nvkm_uchan_sclass,
 	.uevent = nvkm_uchan_uevent,
diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
index 33361784eb4e..e9638f4dd7e6 100644
--- a/include/uapi/drm/nouveau_drm.h
+++ b/include/uapi/drm/nouveau_drm.h
@@ -448,6 +448,20 @@ struct drm_nouveau_get_zcull_info {
 	__u32 ctxsw_align;
 };
 
+struct drm_nouveau_set_zcull_ctxsw_buffer {
+	/**
+	 * @ptr: The virtual address for the buffer, or null to bind nothing
+	 */
+	__u64 addr;
+
+	/**
+	 * @channel: the channel to set the buffer on
+	 */
+	__u32 channel;
+
+	__u32 pad;
+};
+
 #define DRM_NOUVEAU_GETPARAM           0x00
 #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
 #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
@@ -462,6 +476,7 @@ struct drm_nouveau_get_zcull_info {
 #define DRM_NOUVEAU_VM_BIND            0x11
 #define DRM_NOUVEAU_EXEC               0x12
 #define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
+#define DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER 0x14
 #define DRM_NOUVEAU_GEM_NEW            0x40
 #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
 #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
@@ -532,6 +547,7 @@ struct drm_nouveau_svm_bind {
 #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
 
 #define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
+#define DRM_IOCTL_NOUVEAU_SET_ZCULL_CTXSW_BUFFER  DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER, struct drm_nouveau_set_zcull_ctxsw_buffer)
 #if defined(__cplusplus)
 }
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-12 21:36 ` [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO Mel Henning
@ 2025-03-20 18:18   ` Danilo Krummrich
  2025-03-20 18:37     ` Danilo Krummrich
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-20 18:18 UTC (permalink / raw)
  To: Mel Henning; +Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

Hi Mel,

On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> Userspace needs this information to set up zcull correctly.

This is a very brief motivation for the commit, please also describe what the
commit does using imperative form.

> 
> Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
> ---
>  .../drm/nouveau/include/nvkm/core/device.h    |  6 ++
>  .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h | 69 +++++++++++++++++++
>  drivers/gpu/drm/nouveau/nouveau_abi16.c       | 15 ++++
>  drivers/gpu/drm/nouveau/nouveau_abi16.h       |  1 +
>  drivers/gpu/drm/nouveau/nouveau_drm.c         |  1 +
>  drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c | 34 +++++++++
>  include/uapi/drm/nouveau_drm.h                | 19 +++++
>  7 files changed, 145 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/device.h b/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
> index 46afb877a296..d1742ff1cf6b 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
> @@ -3,6 +3,9 @@
>  #define __NVKM_DEVICE_H__
>  #include <core/oclass.h>
>  #include <core/intr.h>
> +
> +#include "uapi/drm/nouveau_drm.h"
> +
>  enum nvkm_subdev_type;
>  
>  enum nvkm_device_type {
> @@ -72,6 +75,9 @@ struct nvkm_device {
>  		bool armed;
>  		bool legacy_done;
>  	} intr;
> +
> +	bool has_zcull_info;
> +	struct drm_nouveau_get_zcull_info zcull_info;

This is bypassing the nvif layer entirely. I think you should store the contents
of struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS in struct r535_gr and have an
nvif interface to access the data.

>  };
>  
>  struct nvkm_subdev *nvkm_device_subdev(struct nvkm_device *, int type, int inst);
> diff --git a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
> index 59f8895bc5d7..01884b926c9c 100644
> --- a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
> +++ b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
> @@ -26,6 +26,75 @@
>   * DEALINGS IN THE SOFTWARE.
>   */
>  
> +/**
> + * NV2080_CTRL_CMD_GR_GET_ZCULL_INFO
> + *
> + * This command is used to query the RM for zcull information that the
> + * driver will need to allocate and manage the zcull regions.
> + *
> + *   widthAlignPixels
> + *     This parameter returns the width alignment restrictions in pixels
> + *     used to adjust a surface for proper aliquot coverage (typically
> + *     #TPC's * 16).
> + *
> + *   heightAlignPixels
> + *     This parameter returns the height alignment restrictions in pixels
> + *     used to adjust a surface for proper aliquot coverage (typically 32).
> + *
> + *   pixelSquaresByAliquots
> + *     This parameter returns the pixel area covered by an aliquot
> + *     (typically #Zcull_banks * 16 * 16).
> + *
> + *   aliquotTotal
> + *     This parameter returns the total aliquot pool available in HW.
> + *
> + *   zcullRegionByteMultiplier
> + *     This parameter returns multiplier used to convert aliquots in a region
> + *     to the number of bytes required to save/restore them.
> + *
> + *   zcullRegionHeaderSize
> + *     This parameter returns the region header size which is required to be
> + *     allocated and accounted for in any save/restore operation on a region.
> + *
> + *   zcullSubregionHeaderSize
> + *     This parameter returns the subregion header size which is required to be
> + *     allocated and accounted for in any save/restore operation on a region.
> + *
> + *   subregionCount
> + *     This parameter returns the subregion count.
> + *
> + *   subregionWidthAlignPixels
> + *     This parameter returns the subregion width alignment restrictions in
> + *     pixels used to adjust a surface for proper aliquot coverage
> + *     (typically #TPC's * 16).
> + *
> + *   subregionHeightAlignPixels
> + *     This parameter returns the subregion height alignment restrictions in
> + *     pixels used to adjust a surface for proper aliquot coverage
> + *     (typically 62).
> + *
> + *   The callee should compute the size of a zcull region as follows.
> + *     (numBytes = aliquots * zcullRegionByteMultiplier +
> + *                 zcullRegionHeaderSize + zcullSubregionHeaderSize)
> + */
> +#define NV2080_CTRL_CMD_GR_GET_ZCULL_INFO            (0x20801206U) /* finn: Evaluated from "(FINN_NV20_SUBDEVICE_0_GR_INTERFACE_ID << 8) | NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_MESSAGE_ID" */
> +
> +#define NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_SUBREGION_SUPPORTED
> +#define NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_MESSAGE_ID (0x6U)
> +
> +typedef struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS {
> +    NvU32 widthAlignPixels;
> +    NvU32 heightAlignPixels;
> +    NvU32 pixelSquaresByAliquots;
> +    NvU32 aliquotTotal;
> +    NvU32 zcullRegionByteMultiplier;
> +    NvU32 zcullRegionHeaderSize;
> +    NvU32 zcullSubregionHeaderSize;
> +    NvU32 subregionCount;
> +    NvU32 subregionWidthAlignPixels;
> +    NvU32 subregionHeightAlignPixels;
> +} NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS;
> +
>  typedef enum NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS {
>      NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_MAIN = 0,
>      NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_SPILL = 1,
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> index 2a0617e5fe2a..981abea97546 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> @@ -333,6 +333,21 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
>  	return 0;
>  }
>  
> +int
> +nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS)
> +{
> +	struct nouveau_drm *drm = nouveau_drm(dev);
> +	struct nvkm_device *device = drm->nvkm;
> +	struct drm_nouveau_get_zcull_info *out = data;
> +
> +	if (device->has_zcull_info) {
> +		*out = device->zcull_info;

This needs copy_to_user().

> +		return 0;
> +	} else {
> +		return -ENODEV;
> +	}
> +}
> +
>  int
>  nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
>  {
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h
> index af6b4e1cefd2..134b3ab58719 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
> @@ -6,6 +6,7 @@
>  	struct drm_device *dev, void *data, struct drm_file *file_priv
>  
>  int nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS);
> +int nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS);
>  int nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS);
>  int nouveau_abi16_ioctl_channel_free(ABI16_IOCTL_ARGS);
>  int nouveau_abi16_ioctl_grobj_alloc(ABI16_IOCTL_ARGS);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 107f63f08bd9..4c4168b50e60 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -1244,6 +1244,7 @@ nouveau_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(NOUVEAU_GROBJ_ALLOC, nouveau_abi16_ioctl_grobj_alloc, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(NOUVEAU_NOTIFIEROBJ_ALLOC, nouveau_abi16_ioctl_notifierobj_alloc, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(NOUVEAU_GPUOBJ_FREE, nouveau_abi16_ioctl_gpuobj_free, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NOUVEAU_GET_ZCULL_INFO, nouveau_abi16_ioctl_get_zcull_info, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_INIT, nouveau_svmm_init, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_BIND, nouveau_svmm_bind, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_NEW, nouveau_gem_ioctl_new, DRM_RENDER_ALLOW),
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c b/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
> index f4bed3eb1ec2..6669f2b1f492 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
> @@ -34,6 +34,7 @@
>  #include <nvrm/535.113.01/common/sdk/nvidia/inc/alloc/alloc_channel.h>
>  #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl0080/ctrl0080fifo.h>
>  #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h>
> +#include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h>
>  #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080internal.h>
>  #include <nvrm/535.113.01/nvidia/generated/g_kernel_channel_nvoc.h>
>  
> @@ -244,6 +245,8 @@ static int
>  r535_gr_oneinit(struct nvkm_gr *base)
>  {
>  	NV2080_CTRL_INTERNAL_STATIC_GR_GET_CONTEXT_BUFFERS_INFO_PARAMS *info;
> +	NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS *zcull_info;
> +	u32 zcull_ctxsw_size, zcull_ctxsw_align;
>  	struct r535_gr *gr = container_of(base, typeof(*gr), base);
>  	struct nvkm_subdev *subdev = &gr->base.engine.subdev;
>  	struct nvkm_device *device = subdev->device;
> @@ -418,6 +421,11 @@ r535_gr_oneinit(struct nvkm_gr *base)
>  		}
>  	}
>  
> +	zcull_ctxsw_size = info->engineContextBuffersInfo[0]
> +		.engine[NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL].size;
> +	zcull_ctxsw_align = info->engineContextBuffersInfo[0]
> +		.engine[NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL].alignment;
> +
>  	nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, info);
>  
>  	/* Promote golden context to RM. */
> @@ -450,6 +458,32 @@ r535_gr_oneinit(struct nvkm_gr *base)
>  		}
>  	}
>  
> +	zcull_info = nvkm_gsp_rm_ctrl_rd(&gsp->internal.device.subdevice,
> +					 NV2080_CTRL_CMD_GR_GET_ZCULL_INFO,
> +					 sizeof(*zcull_info));
> +	if (WARN_ON(IS_ERR(zcull_info))) {
> +		ret = PTR_ERR(zcull_info);
> +		goto done;
> +	}
> +
> +	device->has_zcull_info = true;
> +
> +	device->zcull_info.width_align_pixels = zcull_info->widthAlignPixels;
> +	device->zcull_info.height_align_pixels = zcull_info->heightAlignPixels;
> +	device->zcull_info.pixel_squares_by_aliquots = zcull_info->pixelSquaresByAliquots;
> +	device->zcull_info.aliquot_total = zcull_info->aliquotTotal;
> +	device->zcull_info.zcull_region_byte_multiplier = zcull_info->zcullRegionByteMultiplier;
> +	device->zcull_info.zcull_region_header_size = zcull_info->zcullRegionHeaderSize;
> +	device->zcull_info.zcull_subregion_header_size = zcull_info->zcullSubregionHeaderSize;
> +	device->zcull_info.subregion_count = zcull_info->subregionCount;
> +	device->zcull_info.subregion_width_align_pixels = zcull_info->subregionWidthAlignPixels;
> +	device->zcull_info.subregion_height_align_pixels = zcull_info->subregionHeightAlignPixels;
> +
> +	device->zcull_info.ctxsw_size = zcull_ctxsw_size;
> +	device->zcull_info.ctxsw_align = zcull_ctxsw_align;
> +
> +	nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, zcull_info);

Please move this to a separate function.

> +
>  done:
>  	nvkm_gsp_rm_free(&golden.chan);
>  	for (int i = gr->ctxbuf_nr - 1; i >= 0; i--)
> diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
> index dd87f8f30793..33361784eb4e 100644
> --- a/include/uapi/drm/nouveau_drm.h
> +++ b/include/uapi/drm/nouveau_drm.h

Please do the uAPI change in a separate commit.

> @@ -432,6 +432,22 @@ struct drm_nouveau_exec {
>  	__u64 push_ptr;
>  };
>  
> +struct drm_nouveau_get_zcull_info {

Please add some documentation to this structure.

> +	__u32 width_align_pixels;
> +	__u32 height_align_pixels;
> +	__u32 pixel_squares_by_aliquots;
> +	__u32 aliquot_total;
> +	__u32 zcull_region_byte_multiplier;
> +	__u32 zcull_region_header_size;
> +	__u32 zcull_subregion_header_size;
> +	__u32 subregion_count;
> +	__u32 subregion_width_align_pixels;
> +	__u32 subregion_height_align_pixels;
> +
> +	__u32 ctxsw_size;
> +	__u32 ctxsw_align;
> +};

What if this ever changes between hardware revisions or firmware versions?

> +
>  #define DRM_NOUVEAU_GETPARAM           0x00
>  #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>  #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
> @@ -445,6 +461,7 @@ struct drm_nouveau_exec {
>  #define DRM_NOUVEAU_VM_INIT            0x10
>  #define DRM_NOUVEAU_VM_BIND            0x11
>  #define DRM_NOUVEAU_EXEC               0x12
> +#define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
>  #define DRM_NOUVEAU_GEM_NEW            0x40
>  #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>  #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
> @@ -513,6 +530,8 @@ struct drm_nouveau_svm_bind {
>  #define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>  #define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>  #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
> +
> +#define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER
  2025-03-12 21:36 ` [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER Mel Henning
@ 2025-03-20 18:34   ` Danilo Krummrich
  2025-03-21 23:00     ` M Henning
  0 siblings, 1 reply; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-20 18:34 UTC (permalink / raw)
  To: Mel Henning; +Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Wed, Mar 12, 2025 at 05:36:15PM -0400, Mel Henning wrote:
> diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h

Same here, please split the uAPI change in a separate commit.

> index 33361784eb4e..e9638f4dd7e6 100644
> --- a/include/uapi/drm/nouveau_drm.h
> +++ b/include/uapi/drm/nouveau_drm.h
> @@ -448,6 +448,20 @@ struct drm_nouveau_get_zcull_info {
>  	__u32 ctxsw_align;
>  };
>  
> +struct drm_nouveau_set_zcull_ctxsw_buffer {
> +	/**
> +	 * @ptr: The virtual address for the buffer, or null to bind nothing
> +	 */
> +	__u64 addr;

What is this buffer? Is this a GEM object backed buffer? How is it mapped?

> +
> +	/**
> +	 * @channel: the channel to set the buffer on
> +	 */
> +	__u32 channel;
> +
> +	__u32 pad;
> +};
> +
>  #define DRM_NOUVEAU_GETPARAM           0x00
>  #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>  #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
> @@ -462,6 +476,7 @@ struct drm_nouveau_get_zcull_info {
>  #define DRM_NOUVEAU_VM_BIND            0x11
>  #define DRM_NOUVEAU_EXEC               0x12
>  #define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
> +#define DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER 0x14
>  #define DRM_NOUVEAU_GEM_NEW            0x40
>  #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>  #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
> @@ -532,6 +547,7 @@ struct drm_nouveau_svm_bind {
>  #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>  
>  #define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
> +#define DRM_IOCTL_NOUVEAU_SET_ZCULL_CTXSW_BUFFER  DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER, struct drm_nouveau_set_zcull_ctxsw_buffer)
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-20 18:18   ` Danilo Krummrich
@ 2025-03-20 18:37     ` Danilo Krummrich
  2025-03-20 19:57     ` Ben Skeggs
  2025-03-21 22:06     ` M Henning
  2 siblings, 0 replies; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-20 18:37 UTC (permalink / raw)
  To: Mel Henning; +Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Thu, Mar 20, 2025 at 07:18:13PM +0100, Danilo Krummrich wrote:
> Hi Mel,
> 
> On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> > +int
> > +nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS)
> > +{
> > +	struct nouveau_drm *drm = nouveau_drm(dev);
> > +	struct nvkm_device *device = drm->nvkm;
> > +	struct drm_nouveau_get_zcull_info *out = data;
> > +
> > +	if (device->has_zcull_info) {
> > +		*out = device->zcull_info;
> 
> This needs copy_to_user().

Sorry, had the wrong context, drm_ioctl() should already take care of that.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-20 18:18   ` Danilo Krummrich
  2025-03-20 18:37     ` Danilo Krummrich
@ 2025-03-20 19:57     ` Ben Skeggs
  2025-03-20 20:01       ` Danilo Krummrich
  2025-03-21 22:06     ` M Henning
  2 siblings, 1 reply; 21+ messages in thread
From: Ben Skeggs @ 2025-03-20 19:57 UTC (permalink / raw)
  To: Danilo Krummrich, Mel Henning
  Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On 21/3/25 04:18, Danilo Krummrich wrote:

> Hi Mel,
>
> On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
>> Userspace needs this information to set up zcull correctly.
> This is a very brief motivation for the commit, please also describe what the
> commit does using imperative form.
>
>> Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
>> ---
>>   .../drm/nouveau/include/nvkm/core/device.h    |  6 ++
>>   .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h | 69 +++++++++++++++++++
>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       | 15 ++++
>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |  1 +
>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |  1 +
>>   drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c | 34 +++++++++
>>   include/uapi/drm/nouveau_drm.h                | 19 +++++
>>   7 files changed, 145 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/device.h b/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
>> index 46afb877a296..d1742ff1cf6b 100644
>> --- a/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
>> +++ b/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
>> @@ -3,6 +3,9 @@
>>   #define __NVKM_DEVICE_H__
>>   #include <core/oclass.h>
>>   #include <core/intr.h>
>> +
>> +#include "uapi/drm/nouveau_drm.h"
>> +
>>   enum nvkm_subdev_type;
>>   
>>   enum nvkm_device_type {
>> @@ -72,6 +75,9 @@ struct nvkm_device {
>>   		bool armed;
>>   		bool legacy_done;
>>   	} intr;
>> +
>> +	bool has_zcull_info;
>> +	struct drm_nouveau_get_zcull_info zcull_info;
> This is bypassing the nvif layer entirely. I think you should store the contents
> of struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS in struct r535_gr and have an
> nvif interface to access the data.

I agree here, though nvkm_gr would be a better choice for a couple of 
reasons, not least that it's possible (and should be reasonably trivial) 
to support this on earlier GPUs - should someone desire to at a later point.

Ben.

>
>>   };
>>   
>>   struct nvkm_subdev *nvkm_device_subdev(struct nvkm_device *, int type, int inst);
>> diff --git a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
>> index 59f8895bc5d7..01884b926c9c 100644
>> --- a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
>> +++ b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
>> @@ -26,6 +26,75 @@
>>    * DEALINGS IN THE SOFTWARE.
>>    */
>>   
>> +/**
>> + * NV2080_CTRL_CMD_GR_GET_ZCULL_INFO
>> + *
>> + * This command is used to query the RM for zcull information that the
>> + * driver will need to allocate and manage the zcull regions.
>> + *
>> + *   widthAlignPixels
>> + *     This parameter returns the width alignment restrictions in pixels
>> + *     used to adjust a surface for proper aliquot coverage (typically
>> + *     #TPC's * 16).
>> + *
>> + *   heightAlignPixels
>> + *     This parameter returns the height alignment restrictions in pixels
>> + *     used to adjust a surface for proper aliquot coverage (typically 32).
>> + *
>> + *   pixelSquaresByAliquots
>> + *     This parameter returns the pixel area covered by an aliquot
>> + *     (typically #Zcull_banks * 16 * 16).
>> + *
>> + *   aliquotTotal
>> + *     This parameter returns the total aliquot pool available in HW.
>> + *
>> + *   zcullRegionByteMultiplier
>> + *     This parameter returns multiplier used to convert aliquots in a region
>> + *     to the number of bytes required to save/restore them.
>> + *
>> + *   zcullRegionHeaderSize
>> + *     This parameter returns the region header size which is required to be
>> + *     allocated and accounted for in any save/restore operation on a region.
>> + *
>> + *   zcullSubregionHeaderSize
>> + *     This parameter returns the subregion header size which is required to be
>> + *     allocated and accounted for in any save/restore operation on a region.
>> + *
>> + *   subregionCount
>> + *     This parameter returns the subregion count.
>> + *
>> + *   subregionWidthAlignPixels
>> + *     This parameter returns the subregion width alignment restrictions in
>> + *     pixels used to adjust a surface for proper aliquot coverage
>> + *     (typically #TPC's * 16).
>> + *
>> + *   subregionHeightAlignPixels
>> + *     This parameter returns the subregion height alignment restrictions in
>> + *     pixels used to adjust a surface for proper aliquot coverage
>> + *     (typically 62).
>> + *
>> + *   The callee should compute the size of a zcull region as follows.
>> + *     (numBytes = aliquots * zcullRegionByteMultiplier +
>> + *                 zcullRegionHeaderSize + zcullSubregionHeaderSize)
>> + */
>> +#define NV2080_CTRL_CMD_GR_GET_ZCULL_INFO            (0x20801206U) /* finn: Evaluated from "(FINN_NV20_SUBDEVICE_0_GR_INTERFACE_ID << 8) | NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_MESSAGE_ID" */
>> +
>> +#define NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_SUBREGION_SUPPORTED
>> +#define NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_MESSAGE_ID (0x6U)
>> +
>> +typedef struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS {
>> +    NvU32 widthAlignPixels;
>> +    NvU32 heightAlignPixels;
>> +    NvU32 pixelSquaresByAliquots;
>> +    NvU32 aliquotTotal;
>> +    NvU32 zcullRegionByteMultiplier;
>> +    NvU32 zcullRegionHeaderSize;
>> +    NvU32 zcullSubregionHeaderSize;
>> +    NvU32 subregionCount;
>> +    NvU32 subregionWidthAlignPixels;
>> +    NvU32 subregionHeightAlignPixels;
>> +} NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS;
>> +
>>   typedef enum NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS {
>>       NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_MAIN = 0,
>>       NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_SPILL = 1,
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
>> index 2a0617e5fe2a..981abea97546 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
>> @@ -333,6 +333,21 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
>>   	return 0;
>>   }
>>   
>> +int
>> +nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS)
>> +{
>> +	struct nouveau_drm *drm = nouveau_drm(dev);
>> +	struct nvkm_device *device = drm->nvkm;
>> +	struct drm_nouveau_get_zcull_info *out = data;
>> +
>> +	if (device->has_zcull_info) {
>> +		*out = device->zcull_info;
> This needs copy_to_user().
>
>> +		return 0;
>> +	} else {
>> +		return -ENODEV;
>> +	}
>> +}
>> +
>>   int
>>   nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
>>   {
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h
>> index af6b4e1cefd2..134b3ab58719 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
>> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
>> @@ -6,6 +6,7 @@
>>   	struct drm_device *dev, void *data, struct drm_file *file_priv
>>   
>>   int nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS);
>> +int nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS);
>>   int nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS);
>>   int nouveau_abi16_ioctl_channel_free(ABI16_IOCTL_ARGS);
>>   int nouveau_abi16_ioctl_grobj_alloc(ABI16_IOCTL_ARGS);
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> index 107f63f08bd9..4c4168b50e60 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> @@ -1244,6 +1244,7 @@ nouveau_ioctls[] = {
>>   	DRM_IOCTL_DEF_DRV(NOUVEAU_GROBJ_ALLOC, nouveau_abi16_ioctl_grobj_alloc, DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(NOUVEAU_NOTIFIEROBJ_ALLOC, nouveau_abi16_ioctl_notifierobj_alloc, DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(NOUVEAU_GPUOBJ_FREE, nouveau_abi16_ioctl_gpuobj_free, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(NOUVEAU_GET_ZCULL_INFO, nouveau_abi16_ioctl_get_zcull_info, DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_INIT, nouveau_svmm_init, DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_BIND, nouveau_svmm_bind, DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_NEW, nouveau_gem_ioctl_new, DRM_RENDER_ALLOW),
>> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c b/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
>> index f4bed3eb1ec2..6669f2b1f492 100644
>> --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
>> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
>> @@ -34,6 +34,7 @@
>>   #include <nvrm/535.113.01/common/sdk/nvidia/inc/alloc/alloc_channel.h>
>>   #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl0080/ctrl0080fifo.h>
>>   #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h>
>> +#include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h>
>>   #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080internal.h>
>>   #include <nvrm/535.113.01/nvidia/generated/g_kernel_channel_nvoc.h>
>>   
>> @@ -244,6 +245,8 @@ static int
>>   r535_gr_oneinit(struct nvkm_gr *base)
>>   {
>>   	NV2080_CTRL_INTERNAL_STATIC_GR_GET_CONTEXT_BUFFERS_INFO_PARAMS *info;
>> +	NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS *zcull_info;
>> +	u32 zcull_ctxsw_size, zcull_ctxsw_align;
>>   	struct r535_gr *gr = container_of(base, typeof(*gr), base);
>>   	struct nvkm_subdev *subdev = &gr->base.engine.subdev;
>>   	struct nvkm_device *device = subdev->device;
>> @@ -418,6 +421,11 @@ r535_gr_oneinit(struct nvkm_gr *base)
>>   		}
>>   	}
>>   
>> +	zcull_ctxsw_size = info->engineContextBuffersInfo[0]
>> +		.engine[NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL].size;
>> +	zcull_ctxsw_align = info->engineContextBuffersInfo[0]
>> +		.engine[NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL].alignment;
>> +
>>   	nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, info);
>>   
>>   	/* Promote golden context to RM. */
>> @@ -450,6 +458,32 @@ r535_gr_oneinit(struct nvkm_gr *base)
>>   		}
>>   	}
>>   
>> +	zcull_info = nvkm_gsp_rm_ctrl_rd(&gsp->internal.device.subdevice,
>> +					 NV2080_CTRL_CMD_GR_GET_ZCULL_INFO,
>> +					 sizeof(*zcull_info));
>> +	if (WARN_ON(IS_ERR(zcull_info))) {
>> +		ret = PTR_ERR(zcull_info);
>> +		goto done;
>> +	}
>> +
>> +	device->has_zcull_info = true;
>> +
>> +	device->zcull_info.width_align_pixels = zcull_info->widthAlignPixels;
>> +	device->zcull_info.height_align_pixels = zcull_info->heightAlignPixels;
>> +	device->zcull_info.pixel_squares_by_aliquots = zcull_info->pixelSquaresByAliquots;
>> +	device->zcull_info.aliquot_total = zcull_info->aliquotTotal;
>> +	device->zcull_info.zcull_region_byte_multiplier = zcull_info->zcullRegionByteMultiplier;
>> +	device->zcull_info.zcull_region_header_size = zcull_info->zcullRegionHeaderSize;
>> +	device->zcull_info.zcull_subregion_header_size = zcull_info->zcullSubregionHeaderSize;
>> +	device->zcull_info.subregion_count = zcull_info->subregionCount;
>> +	device->zcull_info.subregion_width_align_pixels = zcull_info->subregionWidthAlignPixels;
>> +	device->zcull_info.subregion_height_align_pixels = zcull_info->subregionHeightAlignPixels;
>> +
>> +	device->zcull_info.ctxsw_size = zcull_ctxsw_size;
>> +	device->zcull_info.ctxsw_align = zcull_ctxsw_align;
>> +
>> +	nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, zcull_info);
> Please move this to a separate function.
>
>> +
>>   done:
>>   	nvkm_gsp_rm_free(&golden.chan);
>>   	for (int i = gr->ctxbuf_nr - 1; i >= 0; i--)
>> diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
>> index dd87f8f30793..33361784eb4e 100644
>> --- a/include/uapi/drm/nouveau_drm.h
>> +++ b/include/uapi/drm/nouveau_drm.h
> Please do the uAPI change in a separate commit.
>
>> @@ -432,6 +432,22 @@ struct drm_nouveau_exec {
>>   	__u64 push_ptr;
>>   };
>>   
>> +struct drm_nouveau_get_zcull_info {
> Please add some documentation to this structure.
>
>> +	__u32 width_align_pixels;
>> +	__u32 height_align_pixels;
>> +	__u32 pixel_squares_by_aliquots;
>> +	__u32 aliquot_total;
>> +	__u32 zcull_region_byte_multiplier;
>> +	__u32 zcull_region_header_size;
>> +	__u32 zcull_subregion_header_size;
>> +	__u32 subregion_count;
>> +	__u32 subregion_width_align_pixels;
>> +	__u32 subregion_height_align_pixels;
>> +
>> +	__u32 ctxsw_size;
>> +	__u32 ctxsw_align;
>> +};
> What if this ever changes between hardware revisions or firmware versions?
>
>> +
>>   #define DRM_NOUVEAU_GETPARAM           0x00
>>   #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
>>   #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
>> @@ -445,6 +461,7 @@ struct drm_nouveau_exec {
>>   #define DRM_NOUVEAU_VM_INIT            0x10
>>   #define DRM_NOUVEAU_VM_BIND            0x11
>>   #define DRM_NOUVEAU_EXEC               0x12
>> +#define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
>>   #define DRM_NOUVEAU_GEM_NEW            0x40
>>   #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
>>   #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
>> @@ -513,6 +530,8 @@ struct drm_nouveau_svm_bind {
>>   #define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
>>   #define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
>>   #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
>> +
>> +#define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
>>   #if defined(__cplusplus)
>>   }
>>   #endif
>> -- 
>> 2.48.1
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-20 19:57     ` Ben Skeggs
@ 2025-03-20 20:01       ` Danilo Krummrich
  2025-03-25 23:40         ` M Henning
  0 siblings, 1 reply; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-20 20:01 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: Mel Henning, Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel,
	nouveau

On Fri, Mar 21, 2025 at 05:57:55AM +1000, Ben Skeggs wrote:
> On 21/3/25 04:18, Danilo Krummrich wrote:
> 
> > Hi Mel,
> > 
> > On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> 
> > > @@ -72,6 +75,9 @@ struct nvkm_device {
> > >   		bool armed;
> > >   		bool legacy_done;
> > >   	} intr;
> > > +
> > > +	bool has_zcull_info;
> > > +	struct drm_nouveau_get_zcull_info zcull_info;
> > This is bypassing the nvif layer entirely. I think you should store the contents
> > of struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS in struct r535_gr and have an
> > nvif interface to access the data.
> 
> I agree here, though nvkm_gr would be a better choice for a couple of
> reasons, not least that it's possible (and should be reasonably trivial) to
> support this on earlier GPUs - should someone desire to at a later point.

I agree, if the interface is stable enough -- I don't know whether this is prone
to change or not.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-20 18:18   ` Danilo Krummrich
  2025-03-20 18:37     ` Danilo Krummrich
  2025-03-20 19:57     ` Ben Skeggs
@ 2025-03-21 22:06     ` M Henning
  2025-03-27 13:51       ` Danilo Krummrich
  2 siblings, 1 reply; 21+ messages in thread
From: M Henning @ 2025-03-21 22:06 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Thu, Mar 20, 2025 at 2:18 PM Danilo Krummrich <dakr@kernel.org> wrote:
>
> Hi Mel,

Hi, thanks for the review.

> On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> > Userspace needs this information to set up zcull correctly.
>
> This is a very brief motivation for the commit, please also describe what the
> commit does using imperative form.

Sure, I guess I'll move some of the motivation that I wrote in the
intro email for this series into this commit message.

> >
> > Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
> > ---
> >  .../drm/nouveau/include/nvkm/core/device.h    |  6 ++
> >  .../sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h | 69 +++++++++++++++++++
> >  drivers/gpu/drm/nouveau/nouveau_abi16.c       | 15 ++++
> >  drivers/gpu/drm/nouveau/nouveau_abi16.h       |  1 +
> >  drivers/gpu/drm/nouveau/nouveau_drm.c         |  1 +
> >  drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c | 34 +++++++++
> >  include/uapi/drm/nouveau_drm.h                | 19 +++++
> >  7 files changed, 145 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/device.h b/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
> > index 46afb877a296..d1742ff1cf6b 100644
> > --- a/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
> > +++ b/drivers/gpu/drm/nouveau/include/nvkm/core/device.h
> > @@ -3,6 +3,9 @@
> >  #define __NVKM_DEVICE_H__
> >  #include <core/oclass.h>
> >  #include <core/intr.h>
> > +
> > +#include "uapi/drm/nouveau_drm.h"
> > +
> >  enum nvkm_subdev_type;
> >
> >  enum nvkm_device_type {
> > @@ -72,6 +75,9 @@ struct nvkm_device {
> >               bool armed;
> >               bool legacy_done;
> >       } intr;
> > +
> > +     bool has_zcull_info;
> > +     struct drm_nouveau_get_zcull_info zcull_info;
>
> This is bypassing the nvif layer entirely. I think you should store the contents
> of struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS in struct r535_gr and have an
> nvif interface to access the data.

Sure. I think my main tripping point here is: do we re-declare the
struct another time? It feels annoying to re-define it for each layer,
and I don't suppose we can expose the CTRL2080 or drm structures
directly in the NVIF layer.

> >  };
> >
> >  struct nvkm_subdev *nvkm_device_subdev(struct nvkm_device *, int type, int inst);
> > diff --git a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
> > index 59f8895bc5d7..01884b926c9c 100644
> > --- a/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
> > +++ b/drivers/gpu/drm/nouveau/include/nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h
> > @@ -26,6 +26,75 @@
> >   * DEALINGS IN THE SOFTWARE.
> >   */
> >
> > +/**
> > + * NV2080_CTRL_CMD_GR_GET_ZCULL_INFO
> > + *
> > + * This command is used to query the RM for zcull information that the
> > + * driver will need to allocate and manage the zcull regions.
> > + *
> > + *   widthAlignPixels
> > + *     This parameter returns the width alignment restrictions in pixels
> > + *     used to adjust a surface for proper aliquot coverage (typically
> > + *     #TPC's * 16).
> > + *
> > + *   heightAlignPixels
> > + *     This parameter returns the height alignment restrictions in pixels
> > + *     used to adjust a surface for proper aliquot coverage (typically 32).
> > + *
> > + *   pixelSquaresByAliquots
> > + *     This parameter returns the pixel area covered by an aliquot
> > + *     (typically #Zcull_banks * 16 * 16).
> > + *
> > + *   aliquotTotal
> > + *     This parameter returns the total aliquot pool available in HW.
> > + *
> > + *   zcullRegionByteMultiplier
> > + *     This parameter returns multiplier used to convert aliquots in a region
> > + *     to the number of bytes required to save/restore them.
> > + *
> > + *   zcullRegionHeaderSize
> > + *     This parameter returns the region header size which is required to be
> > + *     allocated and accounted for in any save/restore operation on a region.
> > + *
> > + *   zcullSubregionHeaderSize
> > + *     This parameter returns the subregion header size which is required to be
> > + *     allocated and accounted for in any save/restore operation on a region.
> > + *
> > + *   subregionCount
> > + *     This parameter returns the subregion count.
> > + *
> > + *   subregionWidthAlignPixels
> > + *     This parameter returns the subregion width alignment restrictions in
> > + *     pixels used to adjust a surface for proper aliquot coverage
> > + *     (typically #TPC's * 16).
> > + *
> > + *   subregionHeightAlignPixels
> > + *     This parameter returns the subregion height alignment restrictions in
> > + *     pixels used to adjust a surface for proper aliquot coverage
> > + *     (typically 62).
> > + *
> > + *   The callee should compute the size of a zcull region as follows.
> > + *     (numBytes = aliquots * zcullRegionByteMultiplier +
> > + *                 zcullRegionHeaderSize + zcullSubregionHeaderSize)
> > + */
> > +#define NV2080_CTRL_CMD_GR_GET_ZCULL_INFO            (0x20801206U) /* finn: Evaluated from "(FINN_NV20_SUBDEVICE_0_GR_INTERFACE_ID << 8) | NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_MESSAGE_ID" */
> > +
> > +#define NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_SUBREGION_SUPPORTED
> > +#define NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS_MESSAGE_ID (0x6U)
> > +
> > +typedef struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS {
> > +    NvU32 widthAlignPixels;
> > +    NvU32 heightAlignPixels;
> > +    NvU32 pixelSquaresByAliquots;
> > +    NvU32 aliquotTotal;
> > +    NvU32 zcullRegionByteMultiplier;
> > +    NvU32 zcullRegionHeaderSize;
> > +    NvU32 zcullSubregionHeaderSize;
> > +    NvU32 subregionCount;
> > +    NvU32 subregionWidthAlignPixels;
> > +    NvU32 subregionHeightAlignPixels;
> > +} NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS;
> > +
> >  typedef enum NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS {
> >      NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_MAIN = 0,
> >      NV2080_CTRL_CMD_GR_CTXSW_PREEMPTION_BIND_BUFFERS_SPILL = 1,
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> > index 2a0617e5fe2a..981abea97546 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> > @@ -333,6 +333,21 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
> >       return 0;
> >  }
> >
> > +int
> > +nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS)
> > +{
> > +     struct nouveau_drm *drm = nouveau_drm(dev);
> > +     struct nvkm_device *device = drm->nvkm;
> > +     struct drm_nouveau_get_zcull_info *out = data;
> > +
> > +     if (device->has_zcull_info) {
> > +             *out = device->zcull_info;
>
> This needs copy_to_user().
>
> > +             return 0;
> > +     } else {
> > +             return -ENODEV;
> > +     }
> > +}
> > +
> >  int
> >  nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
> >  {
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h
> > index af6b4e1cefd2..134b3ab58719 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
> > +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
> > @@ -6,6 +6,7 @@
> >       struct drm_device *dev, void *data, struct drm_file *file_priv
> >
> >  int nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS);
> > +int nouveau_abi16_ioctl_get_zcull_info(ABI16_IOCTL_ARGS);
> >  int nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS);
> >  int nouveau_abi16_ioctl_channel_free(ABI16_IOCTL_ARGS);
> >  int nouveau_abi16_ioctl_grobj_alloc(ABI16_IOCTL_ARGS);
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > index 107f63f08bd9..4c4168b50e60 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> > @@ -1244,6 +1244,7 @@ nouveau_ioctls[] = {
> >       DRM_IOCTL_DEF_DRV(NOUVEAU_GROBJ_ALLOC, nouveau_abi16_ioctl_grobj_alloc, DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(NOUVEAU_NOTIFIEROBJ_ALLOC, nouveau_abi16_ioctl_notifierobj_alloc, DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(NOUVEAU_GPUOBJ_FREE, nouveau_abi16_ioctl_gpuobj_free, DRM_RENDER_ALLOW),
> > +     DRM_IOCTL_DEF_DRV(NOUVEAU_GET_ZCULL_INFO, nouveau_abi16_ioctl_get_zcull_info, DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_INIT, nouveau_svmm_init, DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(NOUVEAU_SVM_BIND, nouveau_svmm_bind, DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_NEW, nouveau_gem_ioctl_new, DRM_RENDER_ALLOW),
> > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c b/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
> > index f4bed3eb1ec2..6669f2b1f492 100644
> > --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
> > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/r535.c
> > @@ -34,6 +34,7 @@
> >  #include <nvrm/535.113.01/common/sdk/nvidia/inc/alloc/alloc_channel.h>
> >  #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl0080/ctrl0080fifo.h>
> >  #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h>
> > +#include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h>
> >  #include <nvrm/535.113.01/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080internal.h>
> >  #include <nvrm/535.113.01/nvidia/generated/g_kernel_channel_nvoc.h>
> >
> > @@ -244,6 +245,8 @@ static int
> >  r535_gr_oneinit(struct nvkm_gr *base)
> >  {
> >       NV2080_CTRL_INTERNAL_STATIC_GR_GET_CONTEXT_BUFFERS_INFO_PARAMS *info;
> > +     NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS *zcull_info;
> > +     u32 zcull_ctxsw_size, zcull_ctxsw_align;
> >       struct r535_gr *gr = container_of(base, typeof(*gr), base);
> >       struct nvkm_subdev *subdev = &gr->base.engine.subdev;
> >       struct nvkm_device *device = subdev->device;
> > @@ -418,6 +421,11 @@ r535_gr_oneinit(struct nvkm_gr *base)
> >               }
> >       }
> >
> > +     zcull_ctxsw_size = info->engineContextBuffersInfo[0]
> > +             .engine[NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL].size;
> > +     zcull_ctxsw_align = info->engineContextBuffersInfo[0]
> > +             .engine[NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES_ENGINE_ID_GRAPHICS_ZCULL].alignment;
> > +
> >       nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, info);
> >
> >       /* Promote golden context to RM. */
> > @@ -450,6 +458,32 @@ r535_gr_oneinit(struct nvkm_gr *base)
> >               }
> >       }
> >
> > +     zcull_info = nvkm_gsp_rm_ctrl_rd(&gsp->internal.device.subdevice,
> > +                                      NV2080_CTRL_CMD_GR_GET_ZCULL_INFO,
> > +                                      sizeof(*zcull_info));
> > +     if (WARN_ON(IS_ERR(zcull_info))) {
> > +             ret = PTR_ERR(zcull_info);
> > +             goto done;
> > +     }
> > +
> > +     device->has_zcull_info = true;
> > +
> > +     device->zcull_info.width_align_pixels = zcull_info->widthAlignPixels;
> > +     device->zcull_info.height_align_pixels = zcull_info->heightAlignPixels;
> > +     device->zcull_info.pixel_squares_by_aliquots = zcull_info->pixelSquaresByAliquots;
> > +     device->zcull_info.aliquot_total = zcull_info->aliquotTotal;
> > +     device->zcull_info.zcull_region_byte_multiplier = zcull_info->zcullRegionByteMultiplier;
> > +     device->zcull_info.zcull_region_header_size = zcull_info->zcullRegionHeaderSize;
> > +     device->zcull_info.zcull_subregion_header_size = zcull_info->zcullSubregionHeaderSize;
> > +     device->zcull_info.subregion_count = zcull_info->subregionCount;
> > +     device->zcull_info.subregion_width_align_pixels = zcull_info->subregionWidthAlignPixels;
> > +     device->zcull_info.subregion_height_align_pixels = zcull_info->subregionHeightAlignPixels;
> > +
> > +     device->zcull_info.ctxsw_size = zcull_ctxsw_size;
> > +     device->zcull_info.ctxsw_align = zcull_ctxsw_align;
> > +
> > +     nvkm_gsp_rm_ctrl_done(&gsp->internal.device.subdevice, zcull_info);
>
> Please move this to a separate function.
>
> > +
> >  done:
> >       nvkm_gsp_rm_free(&golden.chan);
> >       for (int i = gr->ctxbuf_nr - 1; i >= 0; i--)
> > diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
> > index dd87f8f30793..33361784eb4e 100644
> > --- a/include/uapi/drm/nouveau_drm.h
> > +++ b/include/uapi/drm/nouveau_drm.h
>
> Please do the uAPI change in a separate commit.
>
> > @@ -432,6 +432,22 @@ struct drm_nouveau_exec {
> >       __u64 push_ptr;
> >  };
> >
> > +struct drm_nouveau_get_zcull_info {
>
> Please add some documentation to this structure.

Okay. I guess I'll mostly copy the docs from the CTRL2080 structure.

> > +     __u32 width_align_pixels;
> > +     __u32 height_align_pixels;
> > +     __u32 pixel_squares_by_aliquots;
> > +     __u32 aliquot_total;
> > +     __u32 zcull_region_byte_multiplier;
> > +     __u32 zcull_region_header_size;
> > +     __u32 zcull_subregion_header_size;
> > +     __u32 subregion_count;
> > +     __u32 subregion_width_align_pixels;
> > +     __u32 subregion_height_align_pixels;
> > +
> > +     __u32 ctxsw_size;
> > +     __u32 ctxsw_align;
> > +};
>
> What if this ever changes between hardware revisions or firmware versions?

There was some previous discussion of that here:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/12596#note_2796853

From what I can tell, this structure hasn't really changed since
FERMI_C (circa 2011), so I'm not too worried about it changing on us
too quickly. When it does change, we have the option of appending more
members to this struct in the usual way, or if the change is more
fundamental we can return an error from this ioctl and add a new
interface. Userspace needs to handle an error from this ioctl
gracefully anyway since whether it works or not depends on the gpu
generation and what firmware is loaded right now.

> > +
> >  #define DRM_NOUVEAU_GETPARAM           0x00
> >  #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
> >  #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
> > @@ -445,6 +461,7 @@ struct drm_nouveau_exec {
> >  #define DRM_NOUVEAU_VM_INIT            0x10
> >  #define DRM_NOUVEAU_VM_BIND            0x11
> >  #define DRM_NOUVEAU_EXEC               0x12
> > +#define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
> >  #define DRM_NOUVEAU_GEM_NEW            0x40
> >  #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
> >  #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
> > @@ -513,6 +530,8 @@ struct drm_nouveau_svm_bind {
> >  #define DRM_IOCTL_NOUVEAU_VM_INIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_INIT, struct drm_nouveau_vm_init)
> >  #define DRM_IOCTL_NOUVEAU_VM_BIND            DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_VM_BIND, struct drm_nouveau_vm_bind)
> >  #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
> > +
> > +#define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
> >  #if defined(__cplusplus)
> >  }
> >  #endif
> > --
> > 2.48.1
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER
  2025-03-20 18:34   ` Danilo Krummrich
@ 2025-03-21 23:00     ` M Henning
  2025-03-27 13:58       ` Danilo Krummrich
  0 siblings, 1 reply; 21+ messages in thread
From: M Henning @ 2025-03-21 23:00 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

This is a pointer in the gpu's virtual address space. It must be
aligned according to ctxsw_align and be at least ctxsw_size bytes
(where those values come from the nouveau_abi16_ioctl_get_zcull_info
structure). I'll change the description to say that much.

Yes, this is GEM-backed. I'm actually not entirely sure what the
requirements are here, since this part is reverse-engineered. I think
NOUVEAU_GEM_DOMAIN_VRAM and NOUVEAU_GEM_DOMAIN_GART are both okay. The
proprietary driver allocates this buffer using
NV_ESC_RM_VID_HEAP_CONTROL and sets attr = NVOS32_ATTR_LOCATION_ANY |
NVOS32_ATTR_PAGE_SIZE_BIG | NVOS32_ATTR_PHYSICALITY_CONTIGUOUS, attr2
= NVOS32_ATTR2_GPU_CACHEABLE_YES | NVOS32_ATTR2_ZBC_PREFER_NO_ZBC.

On Thu, Mar 20, 2025 at 2:34 PM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Wed, Mar 12, 2025 at 05:36:15PM -0400, Mel Henning wrote:
> > diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
>
> Same here, please split the uAPI change in a separate commit.
>
> > index 33361784eb4e..e9638f4dd7e6 100644
> > --- a/include/uapi/drm/nouveau_drm.h
> > +++ b/include/uapi/drm/nouveau_drm.h
> > @@ -448,6 +448,20 @@ struct drm_nouveau_get_zcull_info {
> >       __u32 ctxsw_align;
> >  };
> >
> > +struct drm_nouveau_set_zcull_ctxsw_buffer {
> > +     /**
> > +      * @ptr: The virtual address for the buffer, or null to bind nothing
> > +      */
> > +     __u64 addr;
>
> What is this buffer? Is this a GEM object backed buffer? How is it mapped?
>
> > +
> > +     /**
> > +      * @channel: the channel to set the buffer on
> > +      */
> > +     __u32 channel;
> > +
> > +     __u32 pad;
> > +};
> > +
> >  #define DRM_NOUVEAU_GETPARAM           0x00
> >  #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
> >  #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
> > @@ -462,6 +476,7 @@ struct drm_nouveau_get_zcull_info {
> >  #define DRM_NOUVEAU_VM_BIND            0x11
> >  #define DRM_NOUVEAU_EXEC               0x12
> >  #define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
> > +#define DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER 0x14
> >  #define DRM_NOUVEAU_GEM_NEW            0x40
> >  #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
> >  #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
> > @@ -532,6 +547,7 @@ struct drm_nouveau_svm_bind {
> >  #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
> >
> >  #define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
> > +#define DRM_IOCTL_NOUVEAU_SET_ZCULL_CTXSW_BUFFER  DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER, struct drm_nouveau_set_zcull_ctxsw_buffer)
> >  #if defined(__cplusplus)
> >  }
> >  #endif
> > --
> > 2.48.1
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-20 20:01       ` Danilo Krummrich
@ 2025-03-25 23:40         ` M Henning
  2025-03-27 12:56           ` Danilo Krummrich
  0 siblings, 1 reply; 21+ messages in thread
From: M Henning @ 2025-03-25 23:40 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Ben Skeggs, Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel,
	nouveau

Okay, that sounds reasonable since I don't expect this to change very quickly.

Since I don't fully understand, is the suggestion here to:
1) add the interface as a function on nvkm_gr using the nvkm_gr_func
vtable and store the actual data on r535_gr
or
2) add the interface to NVIF (which IF?) and store the actual data on nvkm_gr
?

(Sorry, I don't understand how these layers are intended to fit together.))

On Thu, Mar 20, 2025 at 4:02 PM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Fri, Mar 21, 2025 at 05:57:55AM +1000, Ben Skeggs wrote:
> > On 21/3/25 04:18, Danilo Krummrich wrote:
> >
> > > Hi Mel,
> > >
> > > On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> >
> > > > @@ -72,6 +75,9 @@ struct nvkm_device {
> > > >                   bool armed;
> > > >                   bool legacy_done;
> > > >           } intr;
> > > > +
> > > > + bool has_zcull_info;
> > > > + struct drm_nouveau_get_zcull_info zcull_info;
> > > This is bypassing the nvif layer entirely. I think you should store the contents
> > > of struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS in struct r535_gr and have an
> > > nvif interface to access the data.
> >
> > I agree here, though nvkm_gr would be a better choice for a couple of
> > reasons, not least that it's possible (and should be reasonably trivial) to
> > support this on earlier GPUs - should someone desire to at a later point.
>
> I agree, if the interface is stable enough -- I don't know whether this is prone
> to change or not.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-25 23:40         ` M Henning
@ 2025-03-27 12:56           ` Danilo Krummrich
  2025-03-27 18:26             ` M Henning
  0 siblings, 1 reply; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-27 12:56 UTC (permalink / raw)
  To: M Henning
  Cc: Ben Skeggs, Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel,
	nouveau

On Tue, Mar 25, 2025 at 07:40:56PM -0400, M Henning wrote:
> Okay, that sounds reasonable since I don't expect this to change very quickly.
> 
> Since I don't fully understand, is the suggestion here to:
> 1) add the interface as a function on nvkm_gr using the nvkm_gr_func
> vtable and store the actual data on r535_gr
> or
> 2) add the interface to NVIF (which IF?) and store the actual data on nvkm_gr
> ?

I think we want both.

1) I think the suggestion was to store the data directly in nvkm_gr, however the
   structure is indeed specific to r535, so I think, unfortunately, we need the
   vtable and store that data in r535_gr.

2) Yes, this should be passed through nvif. Unfortunately, I think it would need
   to be a whole new one (similar to the fifo one).

Maybe Ben can provide you some additional pointers one this? Mayber he can
suggest a shortcut, since he has patches queued to simplify the whole interface.

> 
> (Sorry, I don't understand how these layers are intended to fit together.))
> 
> On Thu, Mar 20, 2025 at 4:02 PM Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Fri, Mar 21, 2025 at 05:57:55AM +1000, Ben Skeggs wrote:
> > > On 21/3/25 04:18, Danilo Krummrich wrote:
> > >
> > > > Hi Mel,
> > > >
> > > > On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> > >
> > > > > @@ -72,6 +75,9 @@ struct nvkm_device {
> > > > >                   bool armed;
> > > > >                   bool legacy_done;
> > > > >           } intr;
> > > > > +
> > > > > + bool has_zcull_info;
> > > > > + struct drm_nouveau_get_zcull_info zcull_info;
> > > > This is bypassing the nvif layer entirely. I think you should store the contents
> > > > of struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS in struct r535_gr and have an
> > > > nvif interface to access the data.
> > >
> > > I agree here, though nvkm_gr would be a better choice for a couple of
> > > reasons, not least that it's possible (and should be reasonably trivial) to
> > > support this on earlier GPUs - should someone desire to at a later point.
> >
> > I agree, if the interface is stable enough -- I don't know whether this is prone
> > to change or not.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-21 22:06     ` M Henning
@ 2025-03-27 13:51       ` Danilo Krummrich
  2025-03-27 18:03         ` M Henning
  0 siblings, 1 reply; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-27 13:51 UTC (permalink / raw)
  To: M Henning; +Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Fri, Mar 21, 2025 at 06:06:34PM -0400, M Henning wrote:
> On Thu, Mar 20, 2025 at 2:18 PM Danilo Krummrich <dakr@kernel.org> wrote:
> > On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> > > +     __u32 width_align_pixels;
> > > +     __u32 height_align_pixels;
> > > +     __u32 pixel_squares_by_aliquots;
> > > +     __u32 aliquot_total;
> > > +     __u32 zcull_region_byte_multiplier;
> > > +     __u32 zcull_region_header_size;
> > > +     __u32 zcull_subregion_header_size;
> > > +     __u32 subregion_count;
> > > +     __u32 subregion_width_align_pixels;
> > > +     __u32 subregion_height_align_pixels;
> > > +
> > > +     __u32 ctxsw_size;
> > > +     __u32 ctxsw_align;
> > > +};
> >
> > What if this ever changes between hardware revisions or firmware versions?
> 
> There was some previous discussion of that here:
> https://gitlab.freedesktop.org/mesa/mesa/-/issues/12596#note_2796853
> 
> From what I can tell, this structure hasn't really changed since
> FERMI_C (circa 2011), so I'm not too worried about it changing on us
> too quickly. When it does change, we have the option of appending more
> members to this struct in the usual way, or if the change is more
> fundamental we can return an error from this ioctl and add a new
> interface. Userspace needs to handle an error from this ioctl
> gracefully anyway since whether it works or not depends on the gpu
> generation and what firmware is loaded right now.

We could also define it as

	struct drm_nouveau_get_zcull_info {
		__u32 version;
		__u32 _pad;

		union {
			struct drm_nouveau_get_zcull_info_v1 info;
		}
	}

just to be safe.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER
  2025-03-21 23:00     ` M Henning
@ 2025-03-27 13:58       ` Danilo Krummrich
  2025-03-27 19:01         ` M Henning
  0 siblings, 1 reply; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-27 13:58 UTC (permalink / raw)
  To: M Henning; +Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Fri, Mar 21, 2025 at 07:00:57PM -0400, M Henning wrote:
> This is a pointer in the gpu's virtual address space. It must be
> aligned according to ctxsw_align and be at least ctxsw_size bytes
> (where those values come from the nouveau_abi16_ioctl_get_zcull_info
> structure). I'll change the description to say that much.
> 
> Yes, this is GEM-backed. I'm actually not entirely sure what the
> requirements are here, since this part is reverse-engineered. I think
> NOUVEAU_GEM_DOMAIN_VRAM and NOUVEAU_GEM_DOMAIN_GART are both okay. The
> proprietary driver allocates this buffer using
> NV_ESC_RM_VID_HEAP_CONTROL and sets attr = NVOS32_ATTR_LOCATION_ANY |
> NVOS32_ATTR_PAGE_SIZE_BIG | NVOS32_ATTR_PHYSICALITY_CONTIGUOUS, attr2
> = NVOS32_ATTR2_GPU_CACHEABLE_YES | NVOS32_ATTR2_ZBC_PREFER_NO_ZBC.

(Please do not top post.)

What I mean is how do you map the backing GEM into the GPU's virtual address
space? Since it's bound to a channel, I assume that it must be ensured it's
properly mapped when work is pushed to the channel. Is it mapped through
VM_BIND?

> 
> On Thu, Mar 20, 2025 at 2:34 PM Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Wed, Mar 12, 2025 at 05:36:15PM -0400, Mel Henning wrote:
> > > diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
> >
> > Same here, please split the uAPI change in a separate commit.
> >
> > > index 33361784eb4e..e9638f4dd7e6 100644
> > > --- a/include/uapi/drm/nouveau_drm.h
> > > +++ b/include/uapi/drm/nouveau_drm.h
> > > @@ -448,6 +448,20 @@ struct drm_nouveau_get_zcull_info {
> > >       __u32 ctxsw_align;
> > >  };
> > >
> > > +struct drm_nouveau_set_zcull_ctxsw_buffer {
> > > +     /**
> > > +      * @ptr: The virtual address for the buffer, or null to bind nothing
> > > +      */
> > > +     __u64 addr;
> >
> > What is this buffer? Is this a GEM object backed buffer? How is it mapped?
> >
> > > +
> > > +     /**
> > > +      * @channel: the channel to set the buffer on
> > > +      */
> > > +     __u32 channel;
> > > +
> > > +     __u32 pad;
> > > +};
> > > +
> > >  #define DRM_NOUVEAU_GETPARAM           0x00
> > >  #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
> > >  #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
> > > @@ -462,6 +476,7 @@ struct drm_nouveau_get_zcull_info {
> > >  #define DRM_NOUVEAU_VM_BIND            0x11
> > >  #define DRM_NOUVEAU_EXEC               0x12
> > >  #define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
> > > +#define DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER 0x14
> > >  #define DRM_NOUVEAU_GEM_NEW            0x40
> > >  #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
> > >  #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
> > > @@ -532,6 +547,7 @@ struct drm_nouveau_svm_bind {
> > >  #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
> > >
> > >  #define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
> > > +#define DRM_IOCTL_NOUVEAU_SET_ZCULL_CTXSW_BUFFER  DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER, struct drm_nouveau_set_zcull_ctxsw_buffer)
> > >  #if defined(__cplusplus)
> > >  }
> > >  #endif
> > > --
> > > 2.48.1
> > >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-27 13:51       ` Danilo Krummrich
@ 2025-03-27 18:03         ` M Henning
  2025-03-28 11:09           ` Danilo Krummrich
  0 siblings, 1 reply; 21+ messages in thread
From: M Henning @ 2025-03-27 18:03 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Thu, Mar 27, 2025 at 9:51 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Fri, Mar 21, 2025 at 06:06:34PM -0400, M Henning wrote:
> > On Thu, Mar 20, 2025 at 2:18 PM Danilo Krummrich <dakr@kernel.org> wrote:
> > > On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> > > > +     __u32 width_align_pixels;
> > > > +     __u32 height_align_pixels;
> > > > +     __u32 pixel_squares_by_aliquots;
> > > > +     __u32 aliquot_total;
> > > > +     __u32 zcull_region_byte_multiplier;
> > > > +     __u32 zcull_region_header_size;
> > > > +     __u32 zcull_subregion_header_size;
> > > > +     __u32 subregion_count;
> > > > +     __u32 subregion_width_align_pixels;
> > > > +     __u32 subregion_height_align_pixels;
> > > > +
> > > > +     __u32 ctxsw_size;
> > > > +     __u32 ctxsw_align;
> > > > +};
> > >
> > > What if this ever changes between hardware revisions or firmware versions?
> >
> > There was some previous discussion of that here:
> > https://gitlab.freedesktop.org/mesa/mesa/-/issues/12596#note_2796853
> >
> > From what I can tell, this structure hasn't really changed since
> > FERMI_C (circa 2011), so I'm not too worried about it changing on us
> > too quickly. When it does change, we have the option of appending more
> > members to this struct in the usual way, or if the change is more
> > fundamental we can return an error from this ioctl and add a new
> > interface. Userspace needs to handle an error from this ioctl
> > gracefully anyway since whether it works or not depends on the gpu
> > generation and what firmware is loaded right now.
>
> We could also define it as
>
>         struct drm_nouveau_get_zcull_info {
>                 __u32 version;
>                 __u32 _pad;
>
>                 union {
>                         struct drm_nouveau_get_zcull_info_v1 info;
>                 }
>         }
>
> just to be safe.

We can do that, although I don't see any other drm drivers using a
similar pattern anywhere.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-27 12:56           ` Danilo Krummrich
@ 2025-03-27 18:26             ` M Henning
  2025-03-28 11:04               ` Danilo Krummrich
  0 siblings, 1 reply; 21+ messages in thread
From: M Henning @ 2025-03-27 18:26 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Ben Skeggs, Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel,
	nouveau

On Thu, Mar 27, 2025 at 8:56 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Tue, Mar 25, 2025 at 07:40:56PM -0400, M Henning wrote:
> > Okay, that sounds reasonable since I don't expect this to change very quickly.
> >
> > Since I don't fully understand, is the suggestion here to:
> > 1) add the interface as a function on nvkm_gr using the nvkm_gr_func
> > vtable and store the actual data on r535_gr
> > or
> > 2) add the interface to NVIF (which IF?) and store the actual data on nvkm_gr
> > ?
>
> I think we want both.
>
> 1) I think the suggestion was to store the data directly in nvkm_gr, however the
>    structure is indeed specific to r535, so I think, unfortunately, we need the
>    vtable and store that data in r535_gr.

Well, NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS is r535-specific, but we
need to convert it into a common structure and combine it with info
from NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES at some point, so
I think it makes sense to do that conversion+combination before
storing it on any structure. In that case, maybe we store the
structure on nvkm_gr directly during r535_gr_oneinit and then the call
to get the info only goes through NVIF?


> 2) Yes, this should be passed through nvif. Unfortunately, I think it would need
>    to be a whole new one (similar to the fifo one).
>
> Maybe Ben can provide you some additional pointers one this? Mayber he can
> suggest a shortcut, since he has patches queued to simplify the whole interface.
>
> >
> > (Sorry, I don't understand how these layers are intended to fit together.))
> >
> > On Thu, Mar 20, 2025 at 4:02 PM Danilo Krummrich <dakr@kernel.org> wrote:
> > >
> > > On Fri, Mar 21, 2025 at 05:57:55AM +1000, Ben Skeggs wrote:
> > > > On 21/3/25 04:18, Danilo Krummrich wrote:
> > > >
> > > > > Hi Mel,
> > > > >
> > > > > On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> > > >
> > > > > > @@ -72,6 +75,9 @@ struct nvkm_device {
> > > > > >                   bool armed;
> > > > > >                   bool legacy_done;
> > > > > >           } intr;
> > > > > > +
> > > > > > + bool has_zcull_info;
> > > > > > + struct drm_nouveau_get_zcull_info zcull_info;
> > > > > This is bypassing the nvif layer entirely. I think you should store the contents
> > > > > of struct NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS in struct r535_gr and have an
> > > > > nvif interface to access the data.
> > > >
> > > > I agree here, though nvkm_gr would be a better choice for a couple of
> > > > reasons, not least that it's possible (and should be reasonably trivial) to
> > > > support this on earlier GPUs - should someone desire to at a later point.
> > >
> > > I agree, if the interface is stable enough -- I don't know whether this is prone
> > > to change or not.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER
  2025-03-27 13:58       ` Danilo Krummrich
@ 2025-03-27 19:01         ` M Henning
  2025-03-28 11:48           ` Danilo Krummrich
  0 siblings, 1 reply; 21+ messages in thread
From: M Henning @ 2025-03-27 19:01 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Thu, Mar 27, 2025 at 9:58 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Fri, Mar 21, 2025 at 07:00:57PM -0400, M Henning wrote:
> > This is a pointer in the gpu's virtual address space. It must be
> > aligned according to ctxsw_align and be at least ctxsw_size bytes
> > (where those values come from the nouveau_abi16_ioctl_get_zcull_info
> > structure). I'll change the description to say that much.
> >
> > Yes, this is GEM-backed. I'm actually not entirely sure what the
> > requirements are here, since this part is reverse-engineered. I think
> > NOUVEAU_GEM_DOMAIN_VRAM and NOUVEAU_GEM_DOMAIN_GART are both okay. The
> > proprietary driver allocates this buffer using
> > NV_ESC_RM_VID_HEAP_CONTROL and sets attr = NVOS32_ATTR_LOCATION_ANY |
> > NVOS32_ATTR_PAGE_SIZE_BIG | NVOS32_ATTR_PHYSICALITY_CONTIGUOUS, attr2
> > = NVOS32_ATTR2_GPU_CACHEABLE_YES | NVOS32_ATTR2_ZBC_PREFER_NO_ZBC.
>
> (Please do not top post.)
>
> What I mean is how do you map the backing GEM into the GPU's virtual address
> space? Since it's bound to a channel, I assume that it must be ensured it's
> properly mapped when work is pushed to the channel. Is it mapped through
> VM_BIND?

Yes. The userspace code for this is here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33861/diffs?commit_id=0c4baab863730f9fc8b417834ffcbb400f11d617
It calls into the usual function for driver internal allocations
(nvkmd_dev_alloc_mem) which calls VM_BIND internally.

I don't understand: why is this line of questioning important?

> >
> > On Thu, Mar 20, 2025 at 2:34 PM Danilo Krummrich <dakr@kernel.org> wrote:
> > >
> > > On Wed, Mar 12, 2025 at 05:36:15PM -0400, Mel Henning wrote:
> > > > diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
> > >
> > > Same here, please split the uAPI change in a separate commit.
> > >
> > > > index 33361784eb4e..e9638f4dd7e6 100644
> > > > --- a/include/uapi/drm/nouveau_drm.h
> > > > +++ b/include/uapi/drm/nouveau_drm.h
> > > > @@ -448,6 +448,20 @@ struct drm_nouveau_get_zcull_info {
> > > >       __u32 ctxsw_align;
> > > >  };
> > > >
> > > > +struct drm_nouveau_set_zcull_ctxsw_buffer {
> > > > +     /**
> > > > +      * @ptr: The virtual address for the buffer, or null to bind nothing
> > > > +      */
> > > > +     __u64 addr;
> > >
> > > What is this buffer? Is this a GEM object backed buffer? How is it mapped?
> > >
> > > > +
> > > > +     /**
> > > > +      * @channel: the channel to set the buffer on
> > > > +      */
> > > > +     __u32 channel;
> > > > +
> > > > +     __u32 pad;
> > > > +};
> > > > +
> > > >  #define DRM_NOUVEAU_GETPARAM           0x00
> > > >  #define DRM_NOUVEAU_SETPARAM           0x01 /* deprecated */
> > > >  #define DRM_NOUVEAU_CHANNEL_ALLOC      0x02
> > > > @@ -462,6 +476,7 @@ struct drm_nouveau_get_zcull_info {
> > > >  #define DRM_NOUVEAU_VM_BIND            0x11
> > > >  #define DRM_NOUVEAU_EXEC               0x12
> > > >  #define DRM_NOUVEAU_GET_ZCULL_INFO     0x13
> > > > +#define DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER 0x14
> > > >  #define DRM_NOUVEAU_GEM_NEW            0x40
> > > >  #define DRM_NOUVEAU_GEM_PUSHBUF        0x41
> > > >  #define DRM_NOUVEAU_GEM_CPU_PREP       0x42
> > > > @@ -532,6 +547,7 @@ struct drm_nouveau_svm_bind {
> > > >  #define DRM_IOCTL_NOUVEAU_EXEC               DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_EXEC, struct drm_nouveau_exec)
> > > >
> > > >  #define DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO     DRM_IOR (DRM_COMMAND_BASE + DRM_NOUVEAU_GET_ZCULL_INFO, struct drm_nouveau_get_zcull_info)
> > > > +#define DRM_IOCTL_NOUVEAU_SET_ZCULL_CTXSW_BUFFER  DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER, struct drm_nouveau_set_zcull_ctxsw_buffer)
> > > >  #if defined(__cplusplus)
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.48.1
> > > >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-27 18:26             ` M Henning
@ 2025-03-28 11:04               ` Danilo Krummrich
  0 siblings, 0 replies; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-28 11:04 UTC (permalink / raw)
  To: M Henning
  Cc: Ben Skeggs, Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel,
	nouveau

On Thu, Mar 27, 2025 at 02:26:09PM -0400, M Henning wrote:
> On Thu, Mar 27, 2025 at 8:56 AM Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Tue, Mar 25, 2025 at 07:40:56PM -0400, M Henning wrote:
> > > Okay, that sounds reasonable since I don't expect this to change very quickly.
> > >
> > > Since I don't fully understand, is the suggestion here to:
> > > 1) add the interface as a function on nvkm_gr using the nvkm_gr_func
> > > vtable and store the actual data on r535_gr
> > > or
> > > 2) add the interface to NVIF (which IF?) and store the actual data on nvkm_gr
> > > ?
> >
> > I think we want both.
> >
> > 1) I think the suggestion was to store the data directly in nvkm_gr, however the
> >    structure is indeed specific to r535, so I think, unfortunately, we need the
> >    vtable and store that data in r535_gr.
> 
> Well, NV2080_CTRL_GR_GET_ZCULL_INFO_PARAMS is r535-specific, but we
> need to convert it into a common structure and combine it with info
> from NV0080_CTRL_FIFO_GET_ENGINE_CONTEXT_PROPERTIES at some point, so
> I think it makes sense to do that conversion+combination before
> storing it on any structure. In that case, maybe we store the
> structure on nvkm_gr directly during r535_gr_oneinit and then the call
> to get the info only goes through NVIF?

Sounds good to me! It means you need an intermediate structure though, we should
avoid using uAPI structures in NVKM code.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
  2025-03-27 18:03         ` M Henning
@ 2025-03-28 11:09           ` Danilo Krummrich
  0 siblings, 0 replies; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-28 11:09 UTC (permalink / raw)
  To: M Henning
  Cc: Karol Herbst, Lyude Paul, bskeggs, jhubbard, Faith Ekstrand,
	dri-devel, nouveau

(CC: Ben, John)

On Thu, Mar 27, 2025 at 02:03:21PM -0400, M Henning wrote:
> On Thu, Mar 27, 2025 at 9:51 AM Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Fri, Mar 21, 2025 at 06:06:34PM -0400, M Henning wrote:
> > > On Thu, Mar 20, 2025 at 2:18 PM Danilo Krummrich <dakr@kernel.org> wrote:
> > > > On Wed, Mar 12, 2025 at 05:36:14PM -0400, Mel Henning wrote:
> > > > > +     __u32 width_align_pixels;
> > > > > +     __u32 height_align_pixels;
> > > > > +     __u32 pixel_squares_by_aliquots;
> > > > > +     __u32 aliquot_total;
> > > > > +     __u32 zcull_region_byte_multiplier;
> > > > > +     __u32 zcull_region_header_size;
> > > > > +     __u32 zcull_subregion_header_size;
> > > > > +     __u32 subregion_count;
> > > > > +     __u32 subregion_width_align_pixels;
> > > > > +     __u32 subregion_height_align_pixels;
> > > > > +
> > > > > +     __u32 ctxsw_size;
> > > > > +     __u32 ctxsw_align;
> > > > > +};
> > > >
> > > > What if this ever changes between hardware revisions or firmware versions?
> > >
> > > There was some previous discussion of that here:
> > > https://gitlab.freedesktop.org/mesa/mesa/-/issues/12596#note_2796853
> > >
> > > From what I can tell, this structure hasn't really changed since
> > > FERMI_C (circa 2011), so I'm not too worried about it changing on us
> > > too quickly. When it does change, we have the option of appending more
> > > members to this struct in the usual way, or if the change is more
> > > fundamental we can return an error from this ioctl and add a new
> > > interface. Userspace needs to handle an error from this ioctl
> > > gracefully anyway since whether it works or not depends on the gpu
> > > generation and what firmware is loaded right now.
> >
> > We could also define it as
> >
> >         struct drm_nouveau_get_zcull_info {
> >                 __u32 version;
> >                 __u32 _pad;
> >
> >                 union {
> >                         struct drm_nouveau_get_zcull_info_v1 info;
> >                 }
> >         }
> >
> > just to be safe.
> 
> We can do that, although I don't see any other drm drivers using a
> similar pattern anywhere.

I think it's a bit cleaner than adding new members, leave existing ones unset or
add a new IOCTL in the worst case.

Maybe the NVIDIA folks can give us some hint on whether this is expected to
change at some point?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER
  2025-03-27 19:01         ` M Henning
@ 2025-03-28 11:48           ` Danilo Krummrich
  2025-08-01  2:15             ` M Henning
  0 siblings, 1 reply; 21+ messages in thread
From: Danilo Krummrich @ 2025-03-28 11:48 UTC (permalink / raw)
  To: M Henning; +Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Thu, Mar 27, 2025 at 03:01:54PM -0400, M Henning wrote:
> On Thu, Mar 27, 2025 at 9:58 AM Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Fri, Mar 21, 2025 at 07:00:57PM -0400, M Henning wrote:
> > > This is a pointer in the gpu's virtual address space. It must be
> > > aligned according to ctxsw_align and be at least ctxsw_size bytes
> > > (where those values come from the nouveau_abi16_ioctl_get_zcull_info
> > > structure). I'll change the description to say that much.
> > >
> > > Yes, this is GEM-backed. I'm actually not entirely sure what the
> > > requirements are here, since this part is reverse-engineered. I think
> > > NOUVEAU_GEM_DOMAIN_VRAM and NOUVEAU_GEM_DOMAIN_GART are both okay. The
> > > proprietary driver allocates this buffer using
> > > NV_ESC_RM_VID_HEAP_CONTROL and sets attr = NVOS32_ATTR_LOCATION_ANY |
> > > NVOS32_ATTR_PAGE_SIZE_BIG | NVOS32_ATTR_PHYSICALITY_CONTIGUOUS, attr2
> > > = NVOS32_ATTR2_GPU_CACHEABLE_YES | NVOS32_ATTR2_ZBC_PREFER_NO_ZBC.
> >
> > (Please do not top post.)
> >
> > What I mean is how do you map the backing GEM into the GPU's virtual address
> > space? Since it's bound to a channel, I assume that it must be ensured it's
> > properly mapped when work is pushed to the channel. Is it mapped through
> > VM_BIND?
> 
> Yes. The userspace code for this is here:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33861/diffs?commit_id=0c4baab863730f9fc8b417834ffcbb400f11d617
> It calls into the usual function for driver internal allocations
> (nvkmd_dev_alloc_mem) which calls VM_BIND internally.

BOs mapped through VM_BIND are prone to eviction, is this a problem here, or is
it fine if it is only ensured that this mapping is valid for the duration of
subsequent EXEC jobs?

Does the mapping need to be valid when DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER is
called? If so, how is this ensured?

Can DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER be called in between multiple
DRM_NOUVEAU_EXEC calls?

Does it maybe need an async mode, such as EXEC and VM_BIND? (To me it doesn't
seem to be the case, but those questions still need an answer.)

I also think we should document those things.

> I don't understand: why is this line of questioning important?

By sending those patches you ask me as the maintainer of the project to take
resposibility of your changes. In this case it even goes further. In fact, you
ask me to take resposibility of a new interface, which, since it is a uAPI, can
*never* be removed in the future after being released.

It is part of my job to act responsibly, which includes understanding what the
interface does, how it is intended to be used, whether it is sufficient for its
purpose or if it has any flaws.

> 
> > >
> > > On Thu, Mar 20, 2025 at 2:34 PM Danilo Krummrich <dakr@kernel.org> wrote:
> > > >
> > > > On Wed, Mar 12, 2025 at 05:36:15PM -0400, Mel Henning wrote:
> > > > > diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
> > > >
> > > > Same here, please split the uAPI change in a separate commit.
> > > >
> > > > > index 33361784eb4e..e9638f4dd7e6 100644
> > > > > --- a/include/uapi/drm/nouveau_drm.h
> > > > > +++ b/include/uapi/drm/nouveau_drm.h
> > > > > @@ -448,6 +448,20 @@ struct drm_nouveau_get_zcull_info {
> > > > >       __u32 ctxsw_align;
> > > > >  };
> > > > >
> > > > > +struct drm_nouveau_set_zcull_ctxsw_buffer {
> > > > > +     /**
> > > > > +      * @ptr: The virtual address for the buffer, or null to bind nothing
> > > > > +      */
> > > > > +     __u64 addr;
> > > >
> > > > What is this buffer? Is this a GEM object backed buffer? How is it mapped?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER
  2025-03-28 11:48           ` Danilo Krummrich
@ 2025-08-01  2:15             ` M Henning
  0 siblings, 0 replies; 21+ messages in thread
From: M Henning @ 2025-08-01  2:15 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Karol Herbst, Lyude Paul, Faith Ekstrand, dri-devel, nouveau

On Fri, Mar 28, 2025 at 7:48 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Thu, Mar 27, 2025 at 03:01:54PM -0400, M Henning wrote:
> > On Thu, Mar 27, 2025 at 9:58 AM Danilo Krummrich <dakr@kernel.org> wrote:
> > >
> > > On Fri, Mar 21, 2025 at 07:00:57PM -0400, M Henning wrote:
> > > > This is a pointer in the gpu's virtual address space. It must be
> > > > aligned according to ctxsw_align and be at least ctxsw_size bytes
> > > > (where those values come from the nouveau_abi16_ioctl_get_zcull_info
> > > > structure). I'll change the description to say that much.
> > > >
> > > > Yes, this is GEM-backed. I'm actually not entirely sure what the
> > > > requirements are here, since this part is reverse-engineered. I think
> > > > NOUVEAU_GEM_DOMAIN_VRAM and NOUVEAU_GEM_DOMAIN_GART are both okay. The
> > > > proprietary driver allocates this buffer using
> > > > NV_ESC_RM_VID_HEAP_CONTROL and sets attr = NVOS32_ATTR_LOCATION_ANY |
> > > > NVOS32_ATTR_PAGE_SIZE_BIG | NVOS32_ATTR_PHYSICALITY_CONTIGUOUS, attr2
> > > > = NVOS32_ATTR2_GPU_CACHEABLE_YES | NVOS32_ATTR2_ZBC_PREFER_NO_ZBC.
> > >
> > > (Please do not top post.)
> > >
> > > What I mean is how do you map the backing GEM into the GPU's virtual address
> > > space? Since it's bound to a channel, I assume that it must be ensured it's
> > > properly mapped when work is pushed to the channel. Is it mapped through
> > > VM_BIND?
> >
> > Yes. The userspace code for this is here:
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33861/diffs?commit_id=0c4baab863730f9fc8b417834ffcbb400f11d617
> > It calls into the usual function for driver internal allocations
> > (nvkmd_dev_alloc_mem) which calls VM_BIND internally.
>
> BOs mapped through VM_BIND are prone to eviction, is this a problem here, or is
> it fine if it is only ensured that this mapping is valid for the duration of
> subsequent EXEC jobs?

I don't see the proprietary driver doing anything special related to
eviction in traces, which is to say that I think it's fine for it to
just be valid for subsequent EXEC jobs. That being said, I don't have
the deepest understanding of how memory mapping works in
open-gpu-kernel-modules so I might have missed something.

Is there a good way to test this code path? Can I eg. force the kernel
to evict everything in order to test that the context switching still
works?

> Does the mapping need to be valid when DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER is
> called? If so, how is this ensured?

I don't think so. My understanding is that this call is just setting a pointer.

> Can DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER be called in between multiple
> DRM_NOUVEAU_EXEC calls?

Yes, there's nothing that requires a specific ordering - we can use
the context normally before and after the
DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER call.

> Does it maybe need an async mode, such as EXEC and VM_BIND? (To me it doesn't
> seem to be the case, but those questions still need an answer.)

I don't think so. Userspace calls it twice per context right now (once
for init, once for teardown), so I don't expect it to be especially
perf critical.

> I also think we should document those things.
>
> > I don't understand: why is this line of questioning important?
>
> By sending those patches you ask me as the maintainer of the project to take
> resposibility of your changes. In this case it even goes further. In fact, you
> ask me to take resposibility of a new interface, which, since it is a uAPI, can
> *never* be removed in the future after being released.
>
> It is part of my job to act responsibly, which includes understanding what the
> interface does, how it is intended to be used, whether it is sufficient for its
> purpose or if it has any flaws.

Right, sorry about this - I didn't mean to question the purpose of you
asking questions at all. You of course have every right to ask
questions about a patch during code review. I was more just confused
about why you were asking me specifically about VM_BIND, although I
think that's a bit clearer to me now that you've asked questions about
eviction.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-08-01  2:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-12 21:36 [PATCH 0/2] drm/nouveau: ZCULL support Mel Henning
2025-03-12 21:36 ` [PATCH 1/2] drm/nouveau: Add DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO Mel Henning
2025-03-20 18:18   ` Danilo Krummrich
2025-03-20 18:37     ` Danilo Krummrich
2025-03-20 19:57     ` Ben Skeggs
2025-03-20 20:01       ` Danilo Krummrich
2025-03-25 23:40         ` M Henning
2025-03-27 12:56           ` Danilo Krummrich
2025-03-27 18:26             ` M Henning
2025-03-28 11:04               ` Danilo Krummrich
2025-03-21 22:06     ` M Henning
2025-03-27 13:51       ` Danilo Krummrich
2025-03-27 18:03         ` M Henning
2025-03-28 11:09           ` Danilo Krummrich
2025-03-12 21:36 ` [PATCH 2/2] drm/nouveau: DRM_NOUVEAU_SET_ZCULL_CTXSW_BUFFER Mel Henning
2025-03-20 18:34   ` Danilo Krummrich
2025-03-21 23:00     ` M Henning
2025-03-27 13:58       ` Danilo Krummrich
2025-03-27 19:01         ` M Henning
2025-03-28 11:48           ` Danilo Krummrich
2025-08-01  2:15             ` M Henning

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).