[RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [RFC  0/4] Add Format Modifiers for NVIDIA Blackwell chipsets
@ 2025-07-03 22:36 James Jones
  2025-07-03 22:36 ` [RFC 1/4] drm: macros defining fields of NVIDIA modifiers James Jones
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: James Jones @ 2025-07-03 22:36 UTC (permalink / raw)
  To: David Airlie, Lyude Paul, Danilo Krummrich
  Cc: nouveau, dri-devel, Faith Ekstrand, Alexandre Courbot, Ben Skeggs,
	James Jones

The layout of bits within the individual tiles (referred to as
sectors in the DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
changed for some formats starting in Blackwell 2 GPUs. New format
modifiers are needed to denote the difference and prevent direct
sharing of these incompatible buffers with older GPUs.

This patch series proposes first adding some helper macros and
inline functions to drm_fourcc.h to make the NVIDIA block-linear
format modifiers easier to work with given the proposed solution
makes them harder to parse, then extending the existing sector type
field in the parametric format modifier macro
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() by another bit to
accommodate the new layout type.

There are a few ways the parameteric format modifier definition
could have been altered to handle the new layouts:

-The GOB Height and Page Kind field has a reserved value that could
 have been used. However, the GOB height and page kind enums did
 not change relative to prior chips, so this is sort of a lie.
 However, this is the least-invasive change.

-An entirely new field could have been added. This seems
 inappropriate given the presence of an existing appropriate field.
 The advantage here is it avoids splitting the sector layout field
 across two bitfields.

The proposed approach is the logically consistent one, but has the
downside of being the most complex, and that it causes the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to evaluate its
's' parameter twice. However, I believe the added helper functions
and macros address the complexity, and I have audited the relevant
code and do not believe the double evaluation should cause any
problems in practice.

James Jones (4):
  drm: macros defining fields of NVIDIA modifiers
  drm: add inline helper funcs for NVIDIA modifiers
  drm/nouveau: use format modifier helper funcs
  drm: define NVIDIA DRM format modifiers for GB20x

 drivers/gpu/drm/nouveau/nouveau_display.c |  12 ++-
 include/uapi/drm/drm_fourcc.h             | 100 ++++++++++++++++++----
 2 files changed, 92 insertions(+), 20 deletions(-)

-- 
2.49.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC  1/4] drm: macros defining fields of NVIDIA modifiers
  2025-07-03 22:36 [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets James Jones
@ 2025-07-03 22:36 ` James Jones
  2025-07-03 22:36 ` [RFC 2/4] drm: add inline helper funcs for " James Jones
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: James Jones @ 2025-07-03 22:36 UTC (permalink / raw)
  To: David Airlie, Lyude Paul, Danilo Krummrich
  Cc: nouveau, dri-devel, Faith Ekstrand, Alexandre Courbot, Ben Skeggs,
	James Jones

Code parsing the fields of the NVIDIA block-linear
format modifiers frequently needs to use magic
values defined within the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro and
document in the comment above it. Such code is
error prone and hard to read. See for example,
nouveau_decode_mod() in the nouveau driver.

This change adds macros defining the mask and
bit shift associated with each parameteric field
in these modifiers.

Signed-off-by: James Jones <jajones@nvidia.com>
---
 include/uapi/drm/drm_fourcc.h | 38 ++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 6483f76a2165..4240a4a146b6 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -902,6 +902,18 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
 
+/* Fields in the parameteric NVIDIA block linear format modifiers */
+#define __fourcc_mod_nvidia_l2gpbh_mask 0xfULL
+#define __fourcc_mod_nvidia_l2gpbh_shift 0
+#define __fourcc_mod_nvidia_pkind_mask 0xffULL
+#define __fourcc_mod_nvidia_pkind_shift 12
+#define __fourcc_mod_nvidia_kgen_mask 0x3ULL
+#define __fourcc_mod_nvidia_kgen_shift 20
+#define __fourcc_mod_nvidia_slayout_mask 0x1ULL
+#define __fourcc_mod_nvidia_slayout_shift 22
+#define __fourcc_mod_nvidia_comp_mask 0x7ULL
+#define __fourcc_mod_nvidia_comp_shift 23
+
 /*
  * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
  * and Tegra GPUs starting with Tegra K1.
@@ -983,15 +995,21 @@ extern "C" {
  *               6 = Reserved for future use
  *               7 = Reserved for future use
  *
- * 55:25 -     Reserved for future use.  Must be zero.
+ * 55:26 -     Reserved for future use.  Must be zero.
  */
 #define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
-	fourcc_mod_code(NVIDIA, (0x10 | \
-				 ((h) & 0xf) | \
-				 (((k) & 0xff) << 12) | \
-				 (((g) & 0x3) << 20) | \
-				 (((s) & 0x1) << 22) | \
-				 (((c) & 0x7) << 23)))
+	fourcc_mod_code(NVIDIA, \
+			(0x10 | \
+			 (((h) & __fourcc_mod_nvidia_l2gpbh_mask) << \
+			  __fourcc_mod_nvidia_l2gpbh_shift) | \
+			 (((k) & __fourcc_mod_nvidia_pkind_mask) << \
+			  __fourcc_mod_nvidia_pkind_shift) | \
+			 (((g) & __fourcc_mod_nvidia_kgen_mask) << \
+			  __fourcc_mod_nvidia_kgen_shift) | \
+			 (((s) & __fourcc_mod_nvidia_slayout_mask) << \
+			  __fourcc_mod_nvidia_slayout_shift) | \
+			 (((c) & __fourcc_mod_nvidia_comp_mask) << \
+			  __fourcc_mod_nvidia_comp_shift)))
 
 /* To grandfather in prior block linear format modifiers to the above layout,
  * the page kind "0", which corresponds to "pitch/linear" and hence is unusable
@@ -1002,10 +1020,12 @@ extern "C" {
 static inline __u64
 drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 {
-	if (!(modifier & 0x10) || (modifier & (0xff << 12)))
+	if (!(modifier & 0x10) ||
+	    (modifier & (__fourcc_mod_nvidia_pkind_mask <<
+			 __fourcc_mod_nvidia_pkind_shift)))
 		return modifier;
 	else
-		return modifier | (0xfe << 12);
+		return modifier | (0xfeULL << __fourcc_mod_nvidia_pkind_shift);
 }
 
 /*
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC  2/4] drm: add inline helper funcs for NVIDIA modifiers
  2025-07-03 22:36 [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets James Jones
  2025-07-03 22:36 ` [RFC 1/4] drm: macros defining fields of NVIDIA modifiers James Jones
@ 2025-07-03 22:36 ` James Jones
  2025-07-03 22:36 ` [RFC 3/4] drm/nouveau: use format modifier helper funcs James Jones
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: James Jones @ 2025-07-03 22:36 UTC (permalink / raw)
  To: David Airlie, Lyude Paul, Danilo Krummrich
  Cc: nouveau, dri-devel, Faith Ekstrand, Alexandre Courbot, Ben Skeggs,
	James Jones

Code parsing the fields of the NVIDIA block-linear
format modifiers frequently needs to use magic
values defined within the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro and
document in the comment above it. Such code is
error prone and hard to read. See for example,
nouveau_decode_mod() in the nouveau driver.

This change adds macros using the previously added
field definition macros to extract values from
individual fields in a valid NVIDIA block-linear
format modifier.

Signed-off-by: James Jones <jajones@nvidia.com>
---
 include/uapi/drm/drm_fourcc.h | 44 ++++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 4240a4a146b6..052e5fdd1d3b 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -1011,6 +1011,46 @@ extern "C" {
 			 (((c) & __fourcc_mod_nvidia_comp_mask) << \
 			  __fourcc_mod_nvidia_comp_shift)))
 
+#define __DRM_FOURCC_MKNVHELPER_FUNC(f__) \
+	static inline __u64 \
+	drm_fourcc_nvidia_format_mod_##f__(__u64 mod) \
+{ \
+	return (mod >> __fourcc_mod_nvidia_##f__##_shift) & \
+		__fourcc_mod_nvidia_##f__##_mask; \
+}
+
+/*
+ * Get log2 of the block height in gobs specified by mod:
+ * static inline __u64 drm_fourcc_nvidia_format_mod_l2gpbh(__u64 mod)
+ */
+__DRM_FOURCC_MKNVHELPER_FUNC(l2gpbh)
+
+/*
+ * Get the page kind specified by mod:
+ * static inline __u64 drm_fourcc_nvidia_format_mod_pkind(__u64 mod)
+ */
+__DRM_FOURCC_MKNVHELPER_FUNC(pkind)
+
+/*
+ * Get the page kind generation specified by mod:
+ * static inline __u64 drm_fourcc_nvidia_format_mod_kgen(__u64 mod)
+ */
+__DRM_FOURCC_MKNVHELPER_FUNC(kgen)
+
+/*
+ * Get the sector layout specified by mod:
+ * static inline __u64 drm_fourcc_nvidia_format_mod_slayout(__u64 mod)
+ */
+__DRM_FOURCC_MKNVHELPER_FUNC(slayout)
+
+/*
+ * Get the lossless framebuffer compression specified by mod:
+ * static inline __u64 drm_fourcc_nvidia_format_mod_kgen(__u64 mod)
+ */
+__DRM_FOURCC_MKNVHELPER_FUNC(comp)
+
+#undef __DRM_FOURCC_MKNVHELPER_FUNC
+
 /* To grandfather in prior block linear format modifiers to the above layout,
  * the page kind "0", which corresponds to "pitch/linear" and hence is unusable
  * with block-linear layouts, is remapped within drivers to the value 0xfe,
@@ -1020,9 +1060,7 @@ extern "C" {
 static inline __u64
 drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
 {
-	if (!(modifier & 0x10) ||
-	    (modifier & (__fourcc_mod_nvidia_pkind_mask <<
-			 __fourcc_mod_nvidia_pkind_shift)))
+	if (!(modifier & 0x10) || drm_fourcc_nvidia_format_mod_pkind(modifier))
 		return modifier;
 	else
 		return modifier | (0xfeULL << __fourcc_mod_nvidia_pkind_shift);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC  3/4] drm/nouveau: use format modifier helper funcs
  2025-07-03 22:36 [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets James Jones
  2025-07-03 22:36 ` [RFC 1/4] drm: macros defining fields of NVIDIA modifiers James Jones
  2025-07-03 22:36 ` [RFC 2/4] drm: add inline helper funcs for " James Jones
@ 2025-07-03 22:36 ` James Jones
  2025-07-03 22:36 ` [RFC 4/4] drm: define NVIDIA DRM format modifiers for GB20x James Jones
  2025-07-04 14:45 ` [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets Faith Ekstrand
  4 siblings, 0 replies; 11+ messages in thread
From: James Jones @ 2025-07-03 22:36 UTC (permalink / raw)
  To: David Airlie, Lyude Paul, Danilo Krummrich
  Cc: nouveau, dri-devel, Faith Ekstrand, Alexandre Courbot, Ben Skeggs,
	James Jones

When parsing the parameteric NVIDIA block-linear
format modifiers to determine surface tiling
attributes, use the new helper functions to
extract values from various fields. This avoids
using magic values to extract the bitfields from
the modifier, which makes the code more readable.

Signed-off-by: James Jones <jajones@nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index add006fc8d81..1bec664a2b67 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -146,14 +146,18 @@ nouveau_decode_mod(struct nouveau_drm *drm,
 		 * Extract the block height and kind from the corresponding
 		 * modifier fields.  See drm_fourcc.h for details.
 		 */
+		uint64_t pkind = drm_fourcc_nvidia_format_mod_pkind(modifier);
 
-		if ((modifier & (0xffull << 12)) == 0ull) {
+		if (pkind == 0ull) {
 			/* Legacy modifier.  Translate to this dev's 'kind.' */
-			modifier |= disp->format_modifiers[0] & (0xffull << 12);
+			const uint64_t any_dev_mod = disp->format_modifiers[0];
+
+			pkind = drm_fourcc_nvidia_format_mod_pkind(any_dev_mod);
 		}
 
-		*tile_mode = (uint32_t)(modifier & 0xF);
-		*kind = (uint8_t)((modifier >> 12) & 0xFF);
+		*tile_mode =
+			(uint32_t)drm_fourcc_nvidia_format_mod_l2gpbh(modifier);
+		*kind = (uint8_t)pkind;
 
 		if (drm->client.device.info.chipset >= 0xc0)
 			*tile_mode <<= 4;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC  4/4] drm: define NVIDIA DRM format modifiers for GB20x
  2025-07-03 22:36 [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets James Jones
                   ` (2 preceding siblings ...)
  2025-07-03 22:36 ` [RFC 3/4] drm/nouveau: use format modifier helper funcs James Jones
@ 2025-07-03 22:36 ` James Jones
  2025-07-03 23:22   ` Faith Ekstrand
  2025-07-04 14:45 ` [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets Faith Ekstrand
  4 siblings, 1 reply; 11+ messages in thread
From: James Jones @ 2025-07-03 22:36 UTC (permalink / raw)
  To: David Airlie, Lyude Paul, Danilo Krummrich
  Cc: nouveau, dri-devel, Faith Ekstrand, Alexandre Courbot, Ben Skeggs,
	James Jones

The layout of bits within the individual tiles
(referred to as sectors in the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
changed for some formats starting in Blackwell 2
GPUs. To denote the difference, extend the sector
field in the parametric format modifier definition
used to generate modifier values for NVIDIA
hardware.

Without this change, it would be impossible to
differentiate the two layouts based on modifiers,
and as a result software could attempt to share
surfaces directly between pre-GB20x and GB20x
cards, resulting in corruption when the surface
was accessed on one of the GPUs after being
populated with content by the other.

Of note: This change causes the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to
evaluate its "s" parameter twice, with the side
effects that entails. I surveyed all usage of the
modifier in the kernel and Mesa code, and that
does not appear to be problematic in any current
usage, but I thought it was worth calling out.

Signed-off-by: James Jones <jajones@nvidia.com>
---
 include/uapi/drm/drm_fourcc.h | 46 +++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 052e5fdd1d3b..348b2f1c1cb7 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -909,8 +909,10 @@ extern "C" {
 #define __fourcc_mod_nvidia_pkind_shift 12
 #define __fourcc_mod_nvidia_kgen_mask 0x3ULL
 #define __fourcc_mod_nvidia_kgen_shift 20
-#define __fourcc_mod_nvidia_slayout_mask 0x1ULL
-#define __fourcc_mod_nvidia_slayout_shift 22
+#define __fourcc_mod_nvidia_slayout_low_mask 0x1ULL
+#define __fourcc_mod_nvidia_slayout_low_shift 22
+#define __fourcc_mod_nvidia_slayout_high_mask 0x2ULL
+#define __fourcc_mod_nvidia_slayout_high_shift 25
 #define __fourcc_mod_nvidia_comp_mask 0x7ULL
 #define __fourcc_mod_nvidia_comp_shift 23
 
@@ -973,14 +975,16 @@ extern "C" {
  *               2 = Gob Height 8, Turing+ Page Kind mapping
  *               3 = Reserved for future use.
  *
- * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a further
- *             bit remapping step that occurs at an even lower level than the
- *             page kind and block linear swizzles.  This causes the layout of
- *             surfaces mapped in those SOC's GPUs to be incompatible with the
- *             equivalent mapping on other GPUs in the same system.
+ * 22:22 s     Sector layout.  There is a further bit remapping step that occurs
+ * 26:26       at an even lower level than the page kind and block linear
+ *             swizzles.  This causes the bit arrangement of surfaces in memory
+ *             to differ subtly, and prevents direct sharing of surfaces between
+ *             GPUs with different layouts.
  *
- *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
- *               1 = Desktop GPU and Tegra Xavier+ Layout
+ *               0 = Tegra K1 - Tegra Parker/TX2 Layout
+ *               1 = Pre-GB20x, Tegra Xavier-Orin, GB10 Layout
+ *               2 = GB20x(Blackwell 2)+ Layout for some pixel/texel sizes
+ *               3 = reserved for future use.
  *
  * 25:23 c     Lossless Framebuffer Compression type.
  *
@@ -995,7 +999,7 @@ extern "C" {
  *               6 = Reserved for future use
  *               7 = Reserved for future use
  *
- * 55:26 -     Reserved for future use.  Must be zero.
+ * 55:27 -     Reserved for future use.  Must be zero.
  */
 #define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
 	fourcc_mod_code(NVIDIA, \
@@ -1006,8 +1010,10 @@ extern "C" {
 			  __fourcc_mod_nvidia_pkind_shift) | \
 			 (((g) & __fourcc_mod_nvidia_kgen_mask) << \
 			  __fourcc_mod_nvidia_kgen_shift) | \
-			 (((s) & __fourcc_mod_nvidia_slayout_mask) << \
-			  __fourcc_mod_nvidia_slayout_shift) | \
+			 (((s) & __fourcc_mod_nvidia_slayout_low_mask) << \
+			  __fourcc_mod_nvidia_slayout_low_shift) | \
+			 (((s) & __fourcc_mod_nvidia_slayout_high_mask) << \
+			  __fourcc_mod_nvidia_slayout_high_shift) | \
 			 (((c) & __fourcc_mod_nvidia_comp_mask) << \
 			  __fourcc_mod_nvidia_comp_shift)))
 
@@ -1037,12 +1043,6 @@ __DRM_FOURCC_MKNVHELPER_FUNC(pkind)
  */
 __DRM_FOURCC_MKNVHELPER_FUNC(kgen)
 
-/*
- * Get the sector layout specified by mod:
- * static inline __u64 drm_fourcc_nvidia_format_mod_slayout(__u64 mod)
- */
-__DRM_FOURCC_MKNVHELPER_FUNC(slayout)
-
 /*
  * Get the lossless framebuffer compression specified by mod:
  * static inline __u64 drm_fourcc_nvidia_format_mod_kgen(__u64 mod)
@@ -1051,6 +1051,16 @@ __DRM_FOURCC_MKNVHELPER_FUNC(comp)
 
 #undef __DRM_FOURCC_MKNVHELPER_FUNC
 
+/* Get the sector layout specified by mod: */
+static inline __u64
+drm_fourcc_nvidia_format_mod_slayout(__u64 mod)
+{
+	return ((mod >> __fourcc_mod_nvidia_slayout_low_shift) &
+		__fourcc_mod_nvidia_slayout_low_mask) |
+		((mod >> __fourcc_mod_nvidia_slayout_high_shift) &
+		 __fourcc_mod_nvidia_slayout_high_mask);
+}
+
 /* To grandfather in prior block linear format modifiers to the above layout,
  * the page kind "0", which corresponds to "pitch/linear" and hence is unusable
  * with block-linear layouts, is remapped within drivers to the value 0xfe,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC 4/4] drm: define NVIDIA DRM format modifiers for GB20x
  2025-07-03 22:36 ` [RFC 4/4] drm: define NVIDIA DRM format modifiers for GB20x James Jones
@ 2025-07-03 23:22   ` Faith Ekstrand
  2025-07-04  4:54     ` James Jones
  0 siblings, 1 reply; 11+ messages in thread
From: Faith Ekstrand @ 2025-07-03 23:22 UTC (permalink / raw)
  To: James Jones
  Cc: David Airlie, Lyude Paul, Danilo Krummrich, nouveau, dri-devel,
	Faith Ekstrand, Alexandre Courbot, Ben Skeggs

[-- Attachment #1: Type: text/plain, Size: 6559 bytes --]

On Thu, Jul 3, 2025 at 6:34 PM James Jones <jajones@nvidia.com> wrote:

> The layout of bits within the individual tiles
> (referred to as sectors in the
> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
> changed for some formats starting in Blackwell 2
> GPUs. To denote the difference, extend the sector
> field in the parametric format modifier definition
> used to generate modifier values for NVIDIA
> hardware.
>
> Without this change, it would be impossible to
> differentiate the two layouts based on modifiers,
> and as a result software could attempt to share
> surfaces directly between pre-GB20x and GB20x
> cards, resulting in corruption when the surface
> was accessed on one of the GPUs after being
> populated with content by the other.
>
> Of note: This change causes the
> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to
> evaluate its "s" parameter twice, with the side
> effects that entails. I surveyed all usage of the
> modifier in the kernel and Mesa code, and that
> does not appear to be problematic in any current
> usage, but I thought it was worth calling out.
>
> Signed-off-by: James Jones <jajones@nvidia.com>
> ---
>  include/uapi/drm/drm_fourcc.h | 46 +++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 18 deletions(-)
>
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 052e5fdd1d3b..348b2f1c1cb7 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -909,8 +909,10 @@ extern "C" {
>  #define __fourcc_mod_nvidia_pkind_shift 12
>  #define __fourcc_mod_nvidia_kgen_mask 0x3ULL
>  #define __fourcc_mod_nvidia_kgen_shift 20
> -#define __fourcc_mod_nvidia_slayout_mask 0x1ULL
> -#define __fourcc_mod_nvidia_slayout_shift 22
> +#define __fourcc_mod_nvidia_slayout_low_mask 0x1ULL
> +#define __fourcc_mod_nvidia_slayout_low_shift 22
> +#define __fourcc_mod_nvidia_slayout_high_mask 0x2ULL
> +#define __fourcc_mod_nvidia_slayout_high_shift 25
>  #define __fourcc_mod_nvidia_comp_mask 0x7ULL
>  #define __fourcc_mod_nvidia_comp_shift 23
>
> @@ -973,14 +975,16 @@ extern "C" {
>   *               2 = Gob Height 8, Turing+ Page Kind mapping
>   *               3 = Reserved for future use.
>   *
> - * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a
> further
> - *             bit remapping step that occurs at an even lower level than
> the
> - *             page kind and block linear swizzles.  This causes the
> layout of
> - *             surfaces mapped in those SOC's GPUs to be incompatible
> with the
> - *             equivalent mapping on other GPUs in the same system.
> + * 22:22 s     Sector layout.  There is a further bit remapping step that
> occurs
> + * 26:26       at an even lower level than the page kind and block linear
> + *             swizzles.  This causes the bit arrangement of surfaces in
> memory
> + *             to differ subtly, and prevents direct sharing of surfaces
> between
> + *             GPUs with different layouts.
>   *
> - *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
> - *               1 = Desktop GPU and Tegra Xavier+ Layout
> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout
> + *               1 = Pre-GB20x, Tegra Xavier-Orin, GB10 Layout
> + *               2 = GB20x(Blackwell 2)+ Layout for some pixel/texel sizes
>

I'm not sure I like just lumping all of blackwell together. Blackwell is
the same as Turing for 32, 64, and 128-bit formats. It's only different on
8 and 16 and those aren't the same. The way we modeled this for NVK is to
have Turing, Blackwell8Bit, and Blackwell16Bit GOBTypes. I think I'd prefer
the modifiers take a similar form.

Technically, this isn't strictly necessary as there is always a format and
formats with different element sizes aren't compatible so a driver can
always look at format+modifier.  However, it is a better model of the
hardware.

~Faith Ekstrand



> + *               3 = reserved for future use.
>   *
>   * 25:23 c     Lossless Framebuffer Compression type.
>   *
> @@ -995,7 +999,7 @@ extern "C" {
>   *               6 = Reserved for future use
>   *               7 = Reserved for future use
>   *
> - * 55:26 -     Reserved for future use.  Must be zero.
> + * 55:27 -     Reserved for future use.  Must be zero.
>   */
>  #define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
>         fourcc_mod_code(NVIDIA, \
> @@ -1006,8 +1010,10 @@ extern "C" {
>                           __fourcc_mod_nvidia_pkind_shift) | \
>                          (((g) & __fourcc_mod_nvidia_kgen_mask) << \
>                           __fourcc_mod_nvidia_kgen_shift) | \
> -                        (((s) & __fourcc_mod_nvidia_slayout_mask) << \
> -                         __fourcc_mod_nvidia_slayout_shift) | \
> +                        (((s) & __fourcc_mod_nvidia_slayout_low_mask) << \
> +                         __fourcc_mod_nvidia_slayout_low_shift) | \
> +                        (((s) & __fourcc_mod_nvidia_slayout_high_mask) <<
> \
> +                         __fourcc_mod_nvidia_slayout_high_shift) | \
>                          (((c) & __fourcc_mod_nvidia_comp_mask) << \
>                           __fourcc_mod_nvidia_comp_shift)))
>
> @@ -1037,12 +1043,6 @@ __DRM_FOURCC_MKNVHELPER_FUNC(pkind)
>   */
>  __DRM_FOURCC_MKNVHELPER_FUNC(kgen)
>
> -/*
> - * Get the sector layout specified by mod:
> - * static inline __u64 drm_fourcc_nvidia_format_mod_slayout(__u64 mod)
> - */
> -__DRM_FOURCC_MKNVHELPER_FUNC(slayout)
> -
>  /*
>   * Get the lossless framebuffer compression specified by mod:
>   * static inline __u64 drm_fourcc_nvidia_format_mod_kgen(__u64 mod)
> @@ -1051,6 +1051,16 @@ __DRM_FOURCC_MKNVHELPER_FUNC(comp)
>
>  #undef __DRM_FOURCC_MKNVHELPER_FUNC
>
> +/* Get the sector layout specified by mod: */
> +static inline __u64
> +drm_fourcc_nvidia_format_mod_slayout(__u64 mod)
> +{
> +       return ((mod >> __fourcc_mod_nvidia_slayout_low_shift) &
> +               __fourcc_mod_nvidia_slayout_low_mask) |
> +               ((mod >> __fourcc_mod_nvidia_slayout_high_shift) &
> +                __fourcc_mod_nvidia_slayout_high_mask);
> +}
> +
>  /* To grandfather in prior block linear format modifiers to the above
> layout,
>   * the page kind "0", which corresponds to "pitch/linear" and hence is
> unusable
>   * with block-linear layouts, is remapped within drivers to the value
> 0xfe,
> --
> 2.49.0
>
>

[-- Attachment #2: Type: text/html, Size: 7880 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 4/4] drm: define NVIDIA DRM format modifiers for GB20x
  2025-07-03 23:22   ` Faith Ekstrand
@ 2025-07-04  4:54     ` James Jones
  2025-07-04 14:43       ` Faith Ekstrand
  0 siblings, 1 reply; 11+ messages in thread
From: James Jones @ 2025-07-04  4:54 UTC (permalink / raw)
  To: Faith Ekstrand
  Cc: David Airlie, Lyude Paul, Danilo Krummrich, nouveau, dri-devel,
	Faith Ekstrand, Alexandre Courbot, Ben Skeggs

On 7/3/25 16:22, Faith Ekstrand wrote:
> On Thu, Jul 3, 2025 at 6:34 PM James Jones <jajones@nvidia.com 
> <mailto:jajones@nvidia.com>> wrote:
> 
>     The layout of bits within the individual tiles
>     (referred to as sectors in the
>     DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
>     changed for some formats starting in Blackwell 2
>     GPUs. To denote the difference, extend the sector
>     field in the parametric format modifier definition
>     used to generate modifier values for NVIDIA
>     hardware.
> 
>     Without this change, it would be impossible to
>     differentiate the two layouts based on modifiers,
>     and as a result software could attempt to share
>     surfaces directly between pre-GB20x and GB20x
>     cards, resulting in corruption when the surface
>     was accessed on one of the GPUs after being
>     populated with content by the other.
> 
>     Of note: This change causes the
>     DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to
>     evaluate its "s" parameter twice, with the side
>     effects that entails. I surveyed all usage of the
>     modifier in the kernel and Mesa code, and that
>     does not appear to be problematic in any current
>     usage, but I thought it was worth calling out.
> 
>     Signed-off-by: James Jones <jajones@nvidia.com
>     <mailto:jajones@nvidia.com>>
>     ---
>       include/uapi/drm/drm_fourcc.h | 46 +++++++++++++++++++++--------------
>       1 file changed, 28 insertions(+), 18 deletions(-)
> 
>     diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/
>     drm_fourcc.h
>     index 052e5fdd1d3b..348b2f1c1cb7 100644
>     --- a/include/uapi/drm/drm_fourcc.h
>     +++ b/include/uapi/drm/drm_fourcc.h
>     @@ -909,8 +909,10 @@ extern "C" {
>       #define __fourcc_mod_nvidia_pkind_shift 12
>       #define __fourcc_mod_nvidia_kgen_mask 0x3ULL
>       #define __fourcc_mod_nvidia_kgen_shift 20
>     -#define __fourcc_mod_nvidia_slayout_mask 0x1ULL
>     -#define __fourcc_mod_nvidia_slayout_shift 22
>     +#define __fourcc_mod_nvidia_slayout_low_mask 0x1ULL
>     +#define __fourcc_mod_nvidia_slayout_low_shift 22
>     +#define __fourcc_mod_nvidia_slayout_high_mask 0x2ULL
>     +#define __fourcc_mod_nvidia_slayout_high_shift 25
>       #define __fourcc_mod_nvidia_comp_mask 0x7ULL
>       #define __fourcc_mod_nvidia_comp_shift 23
> 
>     @@ -973,14 +975,16 @@ extern "C" {
>        *               2 = Gob Height 8, Turing+ Page Kind mapping
>        *               3 = Reserved for future use.
>        *
>     - * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there
>     is a further
>     - *             bit remapping step that occurs at an even lower
>     level than the
>     - *             page kind and block linear swizzles.  This causes
>     the layout of
>     - *             surfaces mapped in those SOC's GPUs to be
>     incompatible with the
>     - *             equivalent mapping on other GPUs in the same system.
>     + * 22:22 s     Sector layout.  There is a further bit remapping
>     step that occurs
>     + * 26:26       at an even lower level than the page kind and block
>     linear
>     + *             swizzles.  This causes the bit arrangement of
>     surfaces in memory
>     + *             to differ subtly, and prevents direct sharing of
>     surfaces between
>     + *             GPUs with different layouts.
>        *
>     - *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
>     - *               1 = Desktop GPU and Tegra Xavier+ Layout
>     + *               0 = Tegra K1 - Tegra Parker/TX2 Layout
>     + *               1 = Pre-GB20x, Tegra Xavier-Orin, GB10 Layout
>     + *               2 = GB20x(Blackwell 2)+ Layout for some pixel/
>     texel sizes
> 
> 
> I'm not sure I like just lumping all of blackwell together. Blackwell is 
> the same as Turing for 32, 64, and 128-bit formats. It's only different 
> on 8 and 16 and those aren't the same. The way we modeled this for NVK 
> is to have Turing, Blackwell8Bit, and Blackwell16Bit GOBTypes. I think 
> I'd prefer the modifiers take a similar form.
> 
> Technically, this isn't strictly necessary as there is always a format 
> and formats with different element sizes aren't compatible so a driver 
> can always look at format+modifier.  However, it is a better model of 
> the hardware.

Yeah, my thinking was drivers would only use sector layout 2 for those 8 
and 16-bit formats, and still return sector layout 1 modifiers for other 
formats, so I think we're in agreement there. I could update the comment 
to make that clearer.

You also want one sector layout for 8-bit and one for 16-bit (E.g., 2 == 
GB20x 8-bit and 3 == GB20x 16-bit)? I guess there are some cases where 
that would be useful. I just hate to burn extra values, but I don't feel 
strongly. I'll add that in the next iteration if no one objects.

Whatever design we settle on, I think it should be a goal to allow 
pre-GB20x cards to continue sharing e.g., 32-bit surfaces directly with 
GB20x cards. Some users are going to want to mix cards like that at some 
point.

Thanks,
-James

> ~Faith Ekstrand
> 
>     + *               3 = reserved for future use.
>        *
>        * 25:23 c     Lossless Framebuffer Compression type.
>        *
>     @@ -995,7 +999,7 @@ extern "C" {
>        *               6 = Reserved for future use
>        *               7 = Reserved for future use
>        *
>     - * 55:26 -     Reserved for future use.  Must be zero.
>     + * 55:27 -     Reserved for future use.  Must be zero.
>        */
>       #define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
>              fourcc_mod_code(NVIDIA, \
>     @@ -1006,8 +1010,10 @@ extern "C" {
>                                __fourcc_mod_nvidia_pkind_shift) | \
>                               (((g) & __fourcc_mod_nvidia_kgen_mask) << \
>                                __fourcc_mod_nvidia_kgen_shift) | \
>     -                        (((s) & __fourcc_mod_nvidia_slayout_mask) << \
>     -                         __fourcc_mod_nvidia_slayout_shift) | \
>     +                        (((s) &
>     __fourcc_mod_nvidia_slayout_low_mask) << \
>     +                         __fourcc_mod_nvidia_slayout_low_shift) | \
>     +                        (((s) &
>     __fourcc_mod_nvidia_slayout_high_mask) << \
>     +                         __fourcc_mod_nvidia_slayout_high_shift) | \
>                               (((c) & __fourcc_mod_nvidia_comp_mask) << \
>                                __fourcc_mod_nvidia_comp_shift)))
> 
>     @@ -1037,12 +1043,6 @@ __DRM_FOURCC_MKNVHELPER_FUNC(pkind)
>        */
>       __DRM_FOURCC_MKNVHELPER_FUNC(kgen)
> 
>     -/*
>     - * Get the sector layout specified by mod:
>     - * static inline __u64 drm_fourcc_nvidia_format_mod_slayout(__u64 mod)
>     - */
>     -__DRM_FOURCC_MKNVHELPER_FUNC(slayout)
>     -
>       /*
>        * Get the lossless framebuffer compression specified by mod:
>        * static inline __u64 drm_fourcc_nvidia_format_mod_kgen(__u64 mod)
>     @@ -1051,6 +1051,16 @@ __DRM_FOURCC_MKNVHELPER_FUNC(comp)
> 
>       #undef __DRM_FOURCC_MKNVHELPER_FUNC
> 
>     +/* Get the sector layout specified by mod: */
>     +static inline __u64
>     +drm_fourcc_nvidia_format_mod_slayout(__u64 mod)
>     +{
>     +       return ((mod >> __fourcc_mod_nvidia_slayout_low_shift) &
>     +               __fourcc_mod_nvidia_slayout_low_mask) |
>     +               ((mod >> __fourcc_mod_nvidia_slayout_high_shift) &
>     +                __fourcc_mod_nvidia_slayout_high_mask);
>     +}
>     +
>       /* To grandfather in prior block linear format modifiers to the
>     above layout,
>        * the page kind "0", which corresponds to "pitch/linear" and
>     hence is unusable
>        * with block-linear layouts, is remapped within drivers to the
>     value 0xfe,
>     -- 
>     2.49.0
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 4/4] drm: define NVIDIA DRM format modifiers for GB20x
  2025-07-04  4:54     ` James Jones
@ 2025-07-04 14:43       ` Faith Ekstrand
  0 siblings, 0 replies; 11+ messages in thread
From: Faith Ekstrand @ 2025-07-04 14:43 UTC (permalink / raw)
  To: James Jones
  Cc: David Airlie, Lyude Paul, Danilo Krummrich, nouveau, dri-devel,
	Faith Ekstrand, Alexandre Courbot, Ben Skeggs

[-- Attachment #1: Type: text/plain, Size: 8883 bytes --]

On Fri, Jul 4, 2025 at 12:54 AM James Jones <jajones@nvidia.com> wrote:

> On 7/3/25 16:22, Faith Ekstrand wrote:
> > On Thu, Jul 3, 2025 at 6:34 PM James Jones <jajones@nvidia.com
> > <mailto:jajones@nvidia.com>> wrote:
> >
> >     The layout of bits within the individual tiles
> >     (referred to as sectors in the
> >     DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
> >     changed for some formats starting in Blackwell 2
> >     GPUs. To denote the difference, extend the sector
> >     field in the parametric format modifier definition
> >     used to generate modifier values for NVIDIA
> >     hardware.
> >
> >     Without this change, it would be impossible to
> >     differentiate the two layouts based on modifiers,
> >     and as a result software could attempt to share
> >     surfaces directly between pre-GB20x and GB20x
> >     cards, resulting in corruption when the surface
> >     was accessed on one of the GPUs after being
> >     populated with content by the other.
> >
> >     Of note: This change causes the
> >     DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to
> >     evaluate its "s" parameter twice, with the side
> >     effects that entails. I surveyed all usage of the
> >     modifier in the kernel and Mesa code, and that
> >     does not appear to be problematic in any current
> >     usage, but I thought it was worth calling out.
> >
> >     Signed-off-by: James Jones <jajones@nvidia.com
> >     <mailto:jajones@nvidia.com>>
> >     ---
> >       include/uapi/drm/drm_fourcc.h | 46
> +++++++++++++++++++++--------------
> >       1 file changed, 28 insertions(+), 18 deletions(-)
> >
> >     diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/
> >     drm_fourcc.h
> >     index 052e5fdd1d3b..348b2f1c1cb7 100644
> >     --- a/include/uapi/drm/drm_fourcc.h
> >     +++ b/include/uapi/drm/drm_fourcc.h
> >     @@ -909,8 +909,10 @@ extern "C" {
> >       #define __fourcc_mod_nvidia_pkind_shift 12
> >       #define __fourcc_mod_nvidia_kgen_mask 0x3ULL
> >       #define __fourcc_mod_nvidia_kgen_shift 20
> >     -#define __fourcc_mod_nvidia_slayout_mask 0x1ULL
> >     -#define __fourcc_mod_nvidia_slayout_shift 22
> >     +#define __fourcc_mod_nvidia_slayout_low_mask 0x1ULL
> >     +#define __fourcc_mod_nvidia_slayout_low_shift 22
> >     +#define __fourcc_mod_nvidia_slayout_high_mask 0x2ULL
> >     +#define __fourcc_mod_nvidia_slayout_high_shift 25
> >       #define __fourcc_mod_nvidia_comp_mask 0x7ULL
> >       #define __fourcc_mod_nvidia_comp_shift 23
> >
> >     @@ -973,14 +975,16 @@ extern "C" {
> >        *               2 = Gob Height 8, Turing+ Page Kind mapping
> >        *               3 = Reserved for future use.
> >        *
> >     - * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there
> >     is a further
> >     - *             bit remapping step that occurs at an even lower
> >     level than the
> >     - *             page kind and block linear swizzles.  This causes
> >     the layout of
> >     - *             surfaces mapped in those SOC's GPUs to be
> >     incompatible with the
> >     - *             equivalent mapping on other GPUs in the same system.
> >     + * 22:22 s     Sector layout.  There is a further bit remapping
> >     step that occurs
> >     + * 26:26       at an even lower level than the page kind and block
> >     linear
> >     + *             swizzles.  This causes the bit arrangement of
> >     surfaces in memory
> >     + *             to differ subtly, and prevents direct sharing of
> >     surfaces between
> >     + *             GPUs with different layouts.
> >        *
> >     - *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
> >     - *               1 = Desktop GPU and Tegra Xavier+ Layout
> >     + *               0 = Tegra K1 - Tegra Parker/TX2 Layout
> >     + *               1 = Pre-GB20x, Tegra Xavier-Orin, GB10 Layout
> >     + *               2 = GB20x(Blackwell 2)+ Layout for some pixel/
> >     texel sizes
> >
> >
> > I'm not sure I like just lumping all of blackwell together. Blackwell is
> > the same as Turing for 32, 64, and 128-bit formats. It's only different
> > on 8 and 16 and those aren't the same. The way we modeled this for NVK
> > is to have Turing, Blackwell8Bit, and Blackwell16Bit GOBTypes. I think
> > I'd prefer the modifiers take a similar form.
> >
> > Technically, this isn't strictly necessary as there is always a format
> > and formats with different element sizes aren't compatible so a driver
> > can always look at format+modifier.  However, it is a better model of
> > the hardware.
>
> Yeah, my thinking was drivers would only use sector layout 2 for those 8
> and 16-bit formats, and still return sector layout 1 modifiers for other
> formats, so I think we're in agreement there. I could update the comment
> to make that clearer.
>
> You also want one sector layout for 8-bit and one for 16-bit (E.g., 2 ==
> GB20x 8-bit and 3 == GB20x 16-bit)? I guess there are some cases where
> that would be useful. I just hate to burn extra values, but I don't feel
> strongly. I'll add that in the next iteration if no one objects.
>

That was my thinking, yeah. They're definitely different sector layouts and
it's more clear if we make them explicitly GB20x 8-bit and GB20x 16-bit.
That way no one tries to use it for 32-bit or higher. I'm not too worried
about burning an extra couple bits. We can reserve an extra one or two for
sector layout now easily enough while we're shuffling things anyway.


> Whatever design we settle on, I think it should be a goal to allow
> pre-GB20x cards to continue sharing e.g., 32-bit surfaces directly with
> GB20x cards. Some users are going to want to mix cards like that at some
> point.
>

Agreed.

~Faith


> Thanks,
> -James
>
> > ~Faith Ekstrand
> >
> >     + *               3 = reserved for future use.
> >        *
> >        * 25:23 c     Lossless Framebuffer Compression type.
> >        *
> >     @@ -995,7 +999,7 @@ extern "C" {
> >        *               6 = Reserved for future use
> >        *               7 = Reserved for future use
> >        *
> >     - * 55:26 -     Reserved for future use.  Must be zero.
> >     + * 55:27 -     Reserved for future use.  Must be zero.
> >        */
> >       #define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
> >              fourcc_mod_code(NVIDIA, \
> >     @@ -1006,8 +1010,10 @@ extern "C" {
> >                                __fourcc_mod_nvidia_pkind_shift) | \
> >                               (((g) & __fourcc_mod_nvidia_kgen_mask) << \
> >                                __fourcc_mod_nvidia_kgen_shift) | \
> >     -                        (((s) & __fourcc_mod_nvidia_slayout_mask)
> << \
> >     -                         __fourcc_mod_nvidia_slayout_shift) | \
> >     +                        (((s) &
> >     __fourcc_mod_nvidia_slayout_low_mask) << \
> >     +                         __fourcc_mod_nvidia_slayout_low_shift) | \
> >     +                        (((s) &
> >     __fourcc_mod_nvidia_slayout_high_mask) << \
> >     +                         __fourcc_mod_nvidia_slayout_high_shift) | \
> >                               (((c) & __fourcc_mod_nvidia_comp_mask) << \
> >                                __fourcc_mod_nvidia_comp_shift)))
> >
> >     @@ -1037,12 +1043,6 @@ __DRM_FOURCC_MKNVHELPER_FUNC(pkind)
> >        */
> >       __DRM_FOURCC_MKNVHELPER_FUNC(kgen)
> >
> >     -/*
> >     - * Get the sector layout specified by mod:
> >     - * static inline __u64 drm_fourcc_nvidia_format_mod_slayout(__u64
> mod)
> >     - */
> >     -__DRM_FOURCC_MKNVHELPER_FUNC(slayout)
> >     -
> >       /*
> >        * Get the lossless framebuffer compression specified by mod:
> >        * static inline __u64 drm_fourcc_nvidia_format_mod_kgen(__u64 mod)
> >     @@ -1051,6 +1051,16 @@ __DRM_FOURCC_MKNVHELPER_FUNC(comp)
> >
> >       #undef __DRM_FOURCC_MKNVHELPER_FUNC
> >
> >     +/* Get the sector layout specified by mod: */
> >     +static inline __u64
> >     +drm_fourcc_nvidia_format_mod_slayout(__u64 mod)
> >     +{
> >     +       return ((mod >> __fourcc_mod_nvidia_slayout_low_shift) &
> >     +               __fourcc_mod_nvidia_slayout_low_mask) |
> >     +               ((mod >> __fourcc_mod_nvidia_slayout_high_shift) &
> >     +                __fourcc_mod_nvidia_slayout_high_mask);
> >     +}
> >     +
> >       /* To grandfather in prior block linear format modifiers to the
> >     above layout,
> >        * the page kind "0", which corresponds to "pitch/linear" and
> >     hence is unusable
> >        * with block-linear layouts, is remapped within drivers to the
> >     value 0xfe,
> >     --
> >     2.49.0
> >
>
>

[-- Attachment #2: Type: text/html, Size: 11593 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets
  2025-07-03 22:36 [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets James Jones
                   ` (3 preceding siblings ...)
  2025-07-03 22:36 ` [RFC 4/4] drm: define NVIDIA DRM format modifiers for GB20x James Jones
@ 2025-07-04 14:45 ` Faith Ekstrand
  2025-07-16 22:36   ` James Jones
  4 siblings, 1 reply; 11+ messages in thread
From: Faith Ekstrand @ 2025-07-04 14:45 UTC (permalink / raw)
  To: James Jones
  Cc: David Airlie, Lyude Paul, Danilo Krummrich, nouveau, dri-devel,
	Faith Ekstrand, Alexandre Courbot, Ben Skeggs

[-- Attachment #1: Type: text/plain, Size: 2836 bytes --]

On Thu, Jul 3, 2025 at 6:34 PM James Jones <jajones@nvidia.com> wrote:

> The layout of bits within the individual tiles (referred to as
> sectors in the DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
> changed for some formats starting in Blackwell 2 GPUs. New format
> modifiers are needed to denote the difference and prevent direct
> sharing of these incompatible buffers with older GPUs.
>
> This patch series proposes first adding some helper macros and
> inline functions to drm_fourcc.h to make the NVIDIA block-linear
> format modifiers easier to work with given the proposed solution
> makes them harder to parse, then extending the existing sector type
> field in the parametric format modifier macro
> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() by another bit to
> accommodate the new layout type.
>
> There are a few ways the parameteric format modifier definition
> could have been altered to handle the new layouts:
>
> -The GOB Height and Page Kind field has a reserved value that could
>  have been used. However, the GOB height and page kind enums did
>  not change relative to prior chips, so this is sort of a lie.
>  However, this is the least-invasive change.
>
> -An entirely new field could have been added. This seems
>  inappropriate given the presence of an existing appropriate field.
>  The advantage here is it avoids splitting the sector layout field
>  across two bitfields.
>
> The proposed approach is the logically consistent one, but has the
> downside of being the most complex, and that it causes the
> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to evaluate its
> 's' parameter twice. However, I believe the added helper functions
> and macros address the complexity, and I have audited the relevant
> code and do not believe the double evaluation should cause any
> problems in practice.
>

I think we'll converge pretty quickly on the last patch. I'm less sure
about the first 3. While I like the idea of having static inlines for
modifiers that are shared by everybody, we can't actually use them from NVK
because our image layout code is in rust and bindgen can't generate
bindings for inlines so we're going to end up re-typing that all anyway.

Also, I'm not seeing a patch to fix KMS to advertise the correct modifiers.
Were you planning to type that or should I ask Lyude or Ben?

~Faith


> James Jones (4):
>   drm: macros defining fields of NVIDIA modifiers
>   drm: add inline helper funcs for NVIDIA modifiers
>   drm/nouveau: use format modifier helper funcs
>   drm: define NVIDIA DRM format modifiers for GB20x
>
>  drivers/gpu/drm/nouveau/nouveau_display.c |  12 ++-
>  include/uapi/drm/drm_fourcc.h             | 100 ++++++++++++++++++----
>  2 files changed, 92 insertions(+), 20 deletions(-)
>
> --
> 2.49.0
>
>

[-- Attachment #2: Type: text/html, Size: 3486 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets
  2025-07-04 14:45 ` [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets Faith Ekstrand
@ 2025-07-16 22:36   ` James Jones
  2025-07-17 14:49     ` Faith Ekstrand
  0 siblings, 1 reply; 11+ messages in thread
From: James Jones @ 2025-07-16 22:36 UTC (permalink / raw)
  To: Faith Ekstrand
  Cc: David Airlie, Lyude Paul, Danilo Krummrich, nouveau, dri-devel,
	Faith Ekstrand, Alexandre Courbot, Ben Skeggs

On 7/4/25 07:45, Faith Ekstrand wrote:
> On Thu, Jul 3, 2025 at 6:34 PM James Jones <jajones@nvidia.com 
> <mailto:jajones@nvidia.com>> wrote:
> 
>     The layout of bits within the individual tiles (referred to as
>     sectors in the DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
>     changed for some formats starting in Blackwell 2 GPUs. New format
>     modifiers are needed to denote the difference and prevent direct
>     sharing of these incompatible buffers with older GPUs.
> 
>     This patch series proposes first adding some helper macros and
>     inline functions to drm_fourcc.h to make the NVIDIA block-linear
>     format modifiers easier to work with given the proposed solution
>     makes them harder to parse, then extending the existing sector type
>     field in the parametric format modifier macro
>     DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() by another bit to
>     accommodate the new layout type.
> 
>     There are a few ways the parameteric format modifier definition
>     could have been altered to handle the new layouts:
> 
>     -The GOB Height and Page Kind field has a reserved value that could
>       have been used. However, the GOB height and page kind enums did
>       not change relative to prior chips, so this is sort of a lie.
>       However, this is the least-invasive change.
> 
>     -An entirely new field could have been added. This seems
>       inappropriate given the presence of an existing appropriate field.
>       The advantage here is it avoids splitting the sector layout field
>       across two bitfields.
> 
>     The proposed approach is the logically consistent one, but has the
>     downside of being the most complex, and that it causes the
>     DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to evaluate its
>     's' parameter twice. However, I believe the added helper functions
>     and macros address the complexity, and I have audited the relevant
>     code and do not believe the double evaluation should cause any
>     problems in practice.
> 
> 
> I think we'll converge pretty quickly on the last patch. I'm less sure 
> about the first 3. While I like the idea of having static inlines for 
> modifiers that are shared by everybody, we can't actually use them from 
> NVK because our image layout code is in rust and bindgen can't generate 
> bindings for inlines so we're going to end up re-typing that all anyway.

(Sorry for the long delay. Back from a series of vacations now)

Oh, that's too bad. Is there some better way to express this that can be 
consumed by Rust or a Rust generator script? Maybe just some defines for 
the bitfield offsets and sizes? I'm not sure how to clearly encapsulate 
the split sector field that way though. It might be useful in the 
Nouveau kernel code below, but would probably have the same problem 
moving to Nova. Might just have to type that part N times in each client 
codebase. I'll give it some more thought.

The main purpose of including all the inlines before the actual format 
modifier update was to illustrate that although a split bitfield can 
sound nasty (Or did to me initially anyway), it can be encapsulated well 
enough to make it a non-issue. I'm not wedded to the actual 
implementation of the helper code. If we're in general agreement on the 
modifier layout, I'll send that out stand-alone and we can iterate on 
the helpers as needed given they're much less urgent.

> Also, I'm not seeing a patch to fix KMS to advertise the correct 
> modifiers. Were you planning to type that or should I ask Lyude or Ben?

This was just an RFC, so I didn't want to type everything out given it'd 
take a lot more time to test it (I lightly tested the RFC patches with 
some hacky one-off test code). I'll take a look at what's needed in 
Nouveau to advertise the right display modifiers and see whether it's 
something I can realistically sign up to do.

Thanks,
-James

> ~Faith
> 
>     James Jones (4):
>        drm: macros defining fields of NVIDIA modifiers
>        drm: add inline helper funcs for NVIDIA modifiers
>        drm/nouveau: use format modifier helper funcs
>        drm: define NVIDIA DRM format modifiers for GB20x
> 
>       drivers/gpu/drm/nouveau/nouveau_display.c |  12 ++-
>       include/uapi/drm/drm_fourcc.h             | 100 ++++++++++++++++++----
>       2 files changed, 92 insertions(+), 20 deletions(-)
> 
>     -- 
>     2.49.0
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets
  2025-07-16 22:36   ` James Jones
@ 2025-07-17 14:49     ` Faith Ekstrand
  0 siblings, 0 replies; 11+ messages in thread
From: Faith Ekstrand @ 2025-07-17 14:49 UTC (permalink / raw)
  To: James Jones
  Cc: David Airlie, Lyude Paul, Danilo Krummrich, nouveau, dri-devel,
	Faith Ekstrand, Alexandre Courbot, Ben Skeggs

[-- Attachment #1: Type: text/plain, Size: 5325 bytes --]

On Wed, Jul 16, 2025 at 6:36 PM James Jones <jajones@nvidia.com> wrote:

> On 7/4/25 07:45, Faith Ekstrand wrote:
> > On Thu, Jul 3, 2025 at 6:34 PM James Jones <jajones@nvidia.com
> > <mailto:jajones@nvidia.com>> wrote:
> >
> >     The layout of bits within the individual tiles (referred to as
> >     sectors in the DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
> >     changed for some formats starting in Blackwell 2 GPUs. New format
> >     modifiers are needed to denote the difference and prevent direct
> >     sharing of these incompatible buffers with older GPUs.
> >
> >     This patch series proposes first adding some helper macros and
> >     inline functions to drm_fourcc.h to make the NVIDIA block-linear
> >     format modifiers easier to work with given the proposed solution
> >     makes them harder to parse, then extending the existing sector type
> >     field in the parametric format modifier macro
> >     DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() by another bit to
> >     accommodate the new layout type.
> >
> >     There are a few ways the parameteric format modifier definition
> >     could have been altered to handle the new layouts:
> >
> >     -The GOB Height and Page Kind field has a reserved value that could
> >       have been used. However, the GOB height and page kind enums did
> >       not change relative to prior chips, so this is sort of a lie.
> >       However, this is the least-invasive change.
> >
> >     -An entirely new field could have been added. This seems
> >       inappropriate given the presence of an existing appropriate field.
> >       The advantage here is it avoids splitting the sector layout field
> >       across two bitfields.
> >
> >     The proposed approach is the logically consistent one, but has the
> >     downside of being the most complex, and that it causes the
> >     DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to evaluate its
> >     's' parameter twice. However, I believe the added helper functions
> >     and macros address the complexity, and I have audited the relevant
> >     code and do not believe the double evaluation should cause any
> >     problems in practice.
> >
> >
> > I think we'll converge pretty quickly on the last patch. I'm less sure
> > about the first 3. While I like the idea of having static inlines for
> > modifiers that are shared by everybody, we can't actually use them from
> > NVK because our image layout code is in rust and bindgen can't generate
> > bindings for inlines so we're going to end up re-typing that all anyway.
>
> (Sorry for the long delay. Back from a series of vacations now)
>
> Oh, that's too bad. Is there some better way to express this that can be
> consumed by Rust or a Rust generator script? Maybe just some defines for
> the bitfield offsets and sizes? I'm not sure how to clearly encapsulate
> the split sector field that way though. It might be useful in the
> Nouveau kernel code below, but would probably have the same problem
> moving to Nova. Might just have to type that part N times in each client
> codebase. I'll give it some more thought.
>

#defines usually work as long as clang can evaluate them to constants. It
just can't handle macros with parameters. And sometimes macros that are
cnstants but are defined in terms of macros with parameters fall over. An
enum will definitely work 100%.


> The main purpose of including all the inlines before the actual format
> modifier update was to illustrate that although a split bitfield can
> sound nasty (Or did to me initially anyway), it can be encapsulated well
> enough to make it a non-issue. I'm not wedded to the actual
> implementation of the helper code. If we're in general agreement on the
> modifier layout, I'll send that out stand-alone and we can iterate on
> the helpers as needed given they're much less urgent.
>

Yeah, I get that. I'm not scared of a split bitfield, though, so no need to
demonstrate anything to me. 😉

And, yeah, I'm a fan of getting the definition out there sooner rather than
later. Mesa 25.2is shipping in a few weeks with Blackwell support and I
would really like to have modifiers for 565 and 4444.

~Faith



> > Also, I'm not seeing a patch to fix KMS to advertise the correct
> > modifiers. Were you planning to type that or should I ask Lyude or Ben?
>
> This was just an RFC, so I didn't want to type everything out given it'd
> take a lot more time to test it (I lightly tested the RFC patches with
> some hacky one-off test code). I'll take a look at what's needed in
> Nouveau to advertise the right display modifiers and see whether it's
> something I can realistically sign up to do.
>
> Thanks,
> -James
>
> > ~Faith
> >
> >     James Jones (4):
> >        drm: macros defining fields of NVIDIA modifiers
> >        drm: add inline helper funcs for NVIDIA modifiers
> >        drm/nouveau: use format modifier helper funcs
> >        drm: define NVIDIA DRM format modifiers for GB20x
> >
> >       drivers/gpu/drm/nouveau/nouveau_display.c |  12 ++-
> >       include/uapi/drm/drm_fourcc.h             | 100
> ++++++++++++++++++----
> >       2 files changed, 92 insertions(+), 20 deletions(-)
> >
> >     --
> >     2.49.0
> >
>
>

[-- Attachment #2: Type: text/html, Size: 6750 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-07-17 14:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-03 22:36 [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets James Jones
2025-07-03 22:36 ` [RFC 1/4] drm: macros defining fields of NVIDIA modifiers James Jones
2025-07-03 22:36 ` [RFC 2/4] drm: add inline helper funcs for " James Jones
2025-07-03 22:36 ` [RFC 3/4] drm/nouveau: use format modifier helper funcs James Jones
2025-07-03 22:36 ` [RFC 4/4] drm: define NVIDIA DRM format modifiers for GB20x James Jones
2025-07-03 23:22   ` Faith Ekstrand
2025-07-04  4:54     ` James Jones
2025-07-04 14:43       ` Faith Ekstrand
2025-07-04 14:45 ` [RFC 0/4] Add Format Modifiers for NVIDIA Blackwell chipsets Faith Ekstrand
2025-07-16 22:36   ` James Jones
2025-07-17 14:49     ` Faith Ekstrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).