* [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression
@ 2025-10-31 10:49 Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 1/5] drm/nouveau/uvmm: Prepare for larger pages Mohamed Ahmed
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Mohamed Ahmed @ 2025-10-31 10:49 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Mary Guillemard, Faith Ekstrand, Lyude Paul,
Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau,
Mohamed Ahmed
The new VM_BIND interface only supported 4K pages. This was problematic as
it left performance on the table because GPUs don't have sophisticated TLB
and page walker hardware.
Additionally, the HW can only do compression on large (64K) and huge (2M)
pages, which is a major performance booster (>50% in some cases).
This patchset sets out to add support for larger page sizes and also
enable compression and set the compression tags when userspace binds with
the corresponding PTE kinds and alignment. It also increments the nouveau
version number which allows userspace to use compression only when the
kernel actually supports both features and avoid breaking the system if a
newer mesa version is paired with an older kernel version.
For the associated userspace MR, please see !36450:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36450
- v4: Fix missing parenthesis in second patch in the series.
- v3: Add reviewed-by tags, revert page selection logic to v1 behavior.
- v2: Implement review comments, change page selection logic.
- v1: Initial implementation.
Ben Skeggs (2):
drm/nouveau/mmu/gp100: Remove unused/broken support for compression
drm/nouveau/mmu/tu102: Add support for compressed kinds
Mary Guillemard (2):
drm/nouveau/uvmm: Prepare for larger pages
drm/nouveau/uvmm: Allow larger pages
Mohamed Ahmed (1):
drm/nouveau/drm: Bump the driver version to 1.4.1 to report new
features
drivers/gpu/drm/nouveau/nouveau_drv.h | 4 +-
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 102 +++++++++++++++---
drivers/gpu/drm/nouveau/nouveau_uvmm.h | 1 +
.../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 69 +++++++-----
.../drm/nouveau/nvkm/subdev/mmu/vmmgp10b.c | 4 +-
5 files changed, 131 insertions(+), 49 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v4 1/5] drm/nouveau/uvmm: Prepare for larger pages
2025-10-31 10:49 [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mohamed Ahmed
@ 2025-10-31 10:49 ` Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 2/5] drm/nouveau/uvmm: Allow " Mohamed Ahmed
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Mohamed Ahmed @ 2025-10-31 10:49 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Mary Guillemard, Faith Ekstrand, Lyude Paul,
Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau,
Mohamed Ahmed
From: Mary Guillemard <mary@mary.zone>
Currently memory allocated by VM_BIND uAPI can only have a granuality
matching PAGE_SIZE (4KiB in common case)
To have a better memory management and to allow big (64KiB) and huge
(2MiB) pages later in the series, we are now passing the page shift all
around the internals of UVMM.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Co-developed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
---
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 46 ++++++++++++++++----------
drivers/gpu/drm/nouveau/nouveau_uvmm.h | 1 +
2 files changed, 30 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 79eefdfd08a2..2cd0835b05e8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -107,34 +107,34 @@ nouveau_uvmm_vmm_sparse_unref(struct nouveau_uvmm *uvmm,
static int
nouveau_uvmm_vmm_get(struct nouveau_uvmm *uvmm,
- u64 addr, u64 range)
+ u64 addr, u64 range, u8 page_shift)
{
struct nvif_vmm *vmm = &uvmm->vmm.vmm;
- return nvif_vmm_raw_get(vmm, addr, range, PAGE_SHIFT);
+ return nvif_vmm_raw_get(vmm, addr, range, page_shift);
}
static int
nouveau_uvmm_vmm_put(struct nouveau_uvmm *uvmm,
- u64 addr, u64 range)
+ u64 addr, u64 range, u8 page_shift)
{
struct nvif_vmm *vmm = &uvmm->vmm.vmm;
- return nvif_vmm_raw_put(vmm, addr, range, PAGE_SHIFT);
+ return nvif_vmm_raw_put(vmm, addr, range, page_shift);
}
static int
nouveau_uvmm_vmm_unmap(struct nouveau_uvmm *uvmm,
- u64 addr, u64 range, bool sparse)
+ u64 addr, u64 range, u8 page_shift, bool sparse)
{
struct nvif_vmm *vmm = &uvmm->vmm.vmm;
- return nvif_vmm_raw_unmap(vmm, addr, range, PAGE_SHIFT, sparse);
+ return nvif_vmm_raw_unmap(vmm, addr, range, page_shift, sparse);
}
static int
nouveau_uvmm_vmm_map(struct nouveau_uvmm *uvmm,
- u64 addr, u64 range,
+ u64 addr, u64 range, u8 page_shift,
u64 bo_offset, u8 kind,
struct nouveau_mem *mem)
{
@@ -163,7 +163,7 @@ nouveau_uvmm_vmm_map(struct nouveau_uvmm *uvmm,
return -ENOSYS;
}
- return nvif_vmm_raw_map(vmm, addr, range, PAGE_SHIFT,
+ return nvif_vmm_raw_map(vmm, addr, range, page_shift,
&args, argc,
&mem->mem, bo_offset);
}
@@ -182,8 +182,9 @@ nouveau_uvma_vmm_put(struct nouveau_uvma *uvma)
{
u64 addr = uvma->va.va.addr;
u64 range = uvma->va.va.range;
+ u8 page_shift = uvma->page_shift;
- return nouveau_uvmm_vmm_put(to_uvmm(uvma), addr, range);
+ return nouveau_uvmm_vmm_put(to_uvmm(uvma), addr, range, page_shift);
}
static int
@@ -193,9 +194,11 @@ nouveau_uvma_map(struct nouveau_uvma *uvma,
u64 addr = uvma->va.va.addr;
u64 offset = uvma->va.gem.offset;
u64 range = uvma->va.va.range;
+ u8 page_shift = uvma->page_shift;
return nouveau_uvmm_vmm_map(to_uvmm(uvma), addr, range,
- offset, uvma->kind, mem);
+ page_shift, offset, uvma->kind,
+ mem);
}
static int
@@ -203,12 +206,13 @@ nouveau_uvma_unmap(struct nouveau_uvma *uvma)
{
u64 addr = uvma->va.va.addr;
u64 range = uvma->va.va.range;
+ u8 page_shift = uvma->page_shift;
bool sparse = !!uvma->region;
if (drm_gpuva_invalidated(&uvma->va))
return 0;
- return nouveau_uvmm_vmm_unmap(to_uvmm(uvma), addr, range, sparse);
+ return nouveau_uvmm_vmm_unmap(to_uvmm(uvma), addr, range, page_shift, sparse);
}
static int
@@ -501,7 +505,8 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
if (vmm_get_range)
nouveau_uvmm_vmm_put(uvmm, vmm_get_start,
- vmm_get_range);
+ vmm_get_range,
+ PAGE_SHIFT);
break;
}
case DRM_GPUVA_OP_REMAP: {
@@ -528,6 +533,7 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
u64 ustart = va->va.addr;
u64 urange = va->va.range;
u64 uend = ustart + urange;
+ u8 page_shift = uvma_from_va(va)->page_shift;
/* Nothing to do for mappings we merge with. */
if (uend == vmm_get_start ||
@@ -538,7 +544,8 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
u64 vmm_get_range = ustart - vmm_get_start;
nouveau_uvmm_vmm_put(uvmm, vmm_get_start,
- vmm_get_range);
+ vmm_get_range,
+ page_shift);
}
vmm_get_start = uend;
break;
@@ -592,6 +599,7 @@ op_map_prepare(struct nouveau_uvmm *uvmm,
uvma->region = args->region;
uvma->kind = args->kind;
+ uvma->page_shift = PAGE_SHIFT;
drm_gpuva_map(&uvmm->base, &uvma->va, op);
@@ -633,7 +641,8 @@ nouveau_uvmm_sm_prepare(struct nouveau_uvmm *uvmm,
if (vmm_get_range) {
ret = nouveau_uvmm_vmm_get(uvmm, vmm_get_start,
- vmm_get_range);
+ vmm_get_range,
+ new->map->page_shift);
if (ret) {
op_map_prepare_unwind(new->map);
goto unwind;
@@ -689,6 +698,7 @@ nouveau_uvmm_sm_prepare(struct nouveau_uvmm *uvmm,
u64 ustart = va->va.addr;
u64 urange = va->va.range;
u64 uend = ustart + urange;
+ u8 page_shift = uvma_from_va(va)->page_shift;
op_unmap_prepare(u);
@@ -704,7 +714,7 @@ nouveau_uvmm_sm_prepare(struct nouveau_uvmm *uvmm,
u64 vmm_get_range = ustart - vmm_get_start;
ret = nouveau_uvmm_vmm_get(uvmm, vmm_get_start,
- vmm_get_range);
+ vmm_get_range, page_shift);
if (ret) {
op_unmap_prepare_unwind(va);
goto unwind;
@@ -799,10 +809,11 @@ op_unmap_range(struct drm_gpuva_op_unmap *u,
u64 addr, u64 range)
{
struct nouveau_uvma *uvma = uvma_from_va(u->va);
+ u8 page_shift = uvma->page_shift;
bool sparse = !!uvma->region;
if (!drm_gpuva_invalidated(u->va))
- nouveau_uvmm_vmm_unmap(to_uvmm(uvma), addr, range, sparse);
+ nouveau_uvmm_vmm_unmap(to_uvmm(uvma), addr, range, page_shift, sparse);
}
static void
@@ -882,6 +893,7 @@ nouveau_uvmm_sm_cleanup(struct nouveau_uvmm *uvmm,
struct drm_gpuva_op_map *n = r->next;
struct drm_gpuva *va = r->unmap->va;
struct nouveau_uvma *uvma = uvma_from_va(va);
+ u8 page_shift = uvma->page_shift;
if (unmap) {
u64 addr = va->va.addr;
@@ -893,7 +905,7 @@ nouveau_uvmm_sm_cleanup(struct nouveau_uvmm *uvmm,
if (n)
end = n->va.addr;
- nouveau_uvmm_vmm_put(uvmm, addr, end - addr);
+ nouveau_uvmm_vmm_put(uvmm, addr, end - addr, page_shift);
}
nouveau_uvma_gem_put(uvma);
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.h b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
index 9d3c348581eb..51925711ae90 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.h
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.h
@@ -33,6 +33,7 @@ struct nouveau_uvma {
struct nouveau_uvma_region *region;
u8 kind;
+ u8 page_shift;
};
#define uvmm_from_gpuvm(x) container_of((x), struct nouveau_uvmm, base)
--
2.51.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 2/5] drm/nouveau/uvmm: Allow larger pages
2025-10-31 10:49 [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 1/5] drm/nouveau/uvmm: Prepare for larger pages Mohamed Ahmed
@ 2025-10-31 10:49 ` Mohamed Ahmed
2025-10-31 17:01 ` James Jones
2025-11-05 22:51 ` Danilo Krummrich
2025-10-31 10:49 ` [PATCH v4 3/5] drm/nouveau/mmu/gp100: Remove unused/broken support for compression Mohamed Ahmed
` (3 subsequent siblings)
5 siblings, 2 replies; 13+ messages in thread
From: Mohamed Ahmed @ 2025-10-31 10:49 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Mary Guillemard, Faith Ekstrand, Lyude Paul,
Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau,
Mohamed Ahmed
From: Mary Guillemard <mary@mary.zone>
Now that everything in UVMM knows about the variable page shift, we can
select larger values.
The proposed approach relies on nouveau_bo::page unless if it would cause
alignment issues (in which case we fall back to searching for an
appropriate shift)
Signed-off-by: Mary Guillemard <mary@mary.zone>
Co-developed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
---
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 60 +++++++++++++++++++++++++-
1 file changed, 58 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 2cd0835b05e8..ab8933b88337 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -454,6 +454,62 @@ op_unmap_prepare_unwind(struct drm_gpuva *va)
drm_gpuva_insert(va->vm, va);
}
+static bool
+op_map_aligned_to_page_shift(const struct drm_gpuva_op_map *op, u8 page_shift)
+{
+ u64 non_page_bits = (1ULL << page_shift) - 1;
+
+ return (op->va.addr & non_page_bits) == 0 &&
+ (op->va.range & non_page_bits) == 0 &&
+ (op->gem.offset & non_page_bits) == 0;
+}
+
+static u8
+select_page_shift(struct nouveau_uvmm *uvmm, struct drm_gpuva_op_map *op)
+{
+ struct nouveau_bo *nvbo = nouveau_gem_object(op->gem.obj);
+
+ /* nouveau_bo_fixup_align() guarantees that the page size will be aligned
+ * for most cases, but it can't handle cases where userspace allocates with
+ * a size and then binds with a smaller granularity. So in order to avoid
+ * breaking old userspace, we need to ensure that the VA is actually
+ * aligned before using it, and if it isn't, then we downgrade to the first
+ * granularity that will fit, which is optimal from a correctness and
+ * performance perspective.
+ */
+ if (op_map_aligned_to_page_shift(op, nvbo->page))
+ return nvbo->page;
+
+ struct nouveau_mem *mem = nouveau_mem(nvbo->bo.resource);
+ struct nvif_vmm *vmm = &uvmm->vmm.vmm;
+ int i;
+
+ /* If the given granularity doesn't fit, let's find one that will fit. */
+ for (i = 0; i < vmm->page_nr; i++) {
+ /* Ignore anything that is bigger or identical to the BO preference. */
+ if (vmm->page[i].shift >= nvbo->page)
+ continue;
+
+ /* Skip incompatible domains. */
+ if ((mem->mem.type & NVIF_MEM_VRAM) && !vmm->page[i].vram)
+ continue;
+ if ((mem->mem.type & NVIF_MEM_HOST) &&
+ (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
+ continue;
+
+ /* If it fits, return the proposed shift. */
+ if (op_map_aligned_to_page_shift(op, vmm->page[i].shift))
+ return vmm->page[i].shift;
+ }
+
+ /* If we get here then nothing can reconcile the requirements. This should never
+ * happen.
+ */
+ WARN_ON(1);
+
+ return PAGE_SHIFT;
+}
+
static void
nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
struct nouveau_uvma_prealloc *new,
@@ -506,7 +562,7 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
if (vmm_get_range)
nouveau_uvmm_vmm_put(uvmm, vmm_get_start,
vmm_get_range,
- PAGE_SHIFT);
+ select_page_shift(uvmm, &op->map));
break;
}
case DRM_GPUVA_OP_REMAP: {
@@ -599,7 +655,7 @@ op_map_prepare(struct nouveau_uvmm *uvmm,
uvma->region = args->region;
uvma->kind = args->kind;
- uvma->page_shift = PAGE_SHIFT;
+ uvma->page_shift = select_page_shift(uvmm, op);
drm_gpuva_map(&uvmm->base, &uvma->va, op);
--
2.51.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 3/5] drm/nouveau/mmu/gp100: Remove unused/broken support for compression
2025-10-31 10:49 [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 1/5] drm/nouveau/uvmm: Prepare for larger pages Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 2/5] drm/nouveau/uvmm: Allow " Mohamed Ahmed
@ 2025-10-31 10:49 ` Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 4/5] drm/nouveau/mmu/tu102: Add support for compressed kinds Mohamed Ahmed
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Mohamed Ahmed @ 2025-10-31 10:49 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Mary Guillemard, Faith Ekstrand, Lyude Paul,
Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau,
Ben Skeggs, Mohamed Ahmed
From: Ben Skeggs <bskeggs@nvidia.com>
From GP100 onwards it's not possible to initialise comptag RAM without
PMU firmware, which nouveau has no support for.
As such, this code is essentially a no-op and will always revert to the
equivalent non-compressed kind due to comptag allocation failure. It's
also broken for the needs of VM_BIND/Vulkan.
Remove the code entirely to make way for supporting compression on GPUs
that support GSM-RM.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
---
.../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 39 ++-----------------
.../drm/nouveau/nvkm/subdev/mmu/vmmgp10b.c | 4 +-
2 files changed, 6 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
index 851fd847a2a9..ecff1096a1bb 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
@@ -21,9 +21,7 @@
*/
#include "vmm.h"
-#include <core/client.h>
#include <subdev/fb.h>
-#include <subdev/ltc.h>
#include <subdev/timer.h>
#include <engine/gr.h>
@@ -117,8 +115,6 @@ gp100_vmm_pgt_pte(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
{
u64 data = (addr >> 4) | map->type;
- map->type += ptes * map->ctag;
-
while (ptes--) {
VMM_WO064(pt, vmm, ptei++ * 8, data);
data += map->next;
@@ -142,7 +138,6 @@ gp100_vmm_pgt_dma(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
while (ptes--) {
const u64 data = (*map->dma++ >> 4) | map->type;
VMM_WO064(pt, vmm, ptei++ * 8, data);
- map->type += map->ctag;
}
nvkm_done(pt->memory);
return;
@@ -200,8 +195,6 @@ gp100_vmm_pd0_pte(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
{
u64 data = (addr >> 4) | map->type;
- map->type += ptes * map->ctag;
-
while (ptes--) {
VMM_WO128(pt, vmm, ptei++ * 0x10, data, 0ULL);
data += map->next;
@@ -411,8 +404,6 @@ gp100_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
struct gp100_vmm_map_vn vn;
struct gp100_vmm_map_v0 v0;
} *args = argv;
- struct nvkm_device *device = vmm->mmu->subdev.device;
- struct nvkm_memory *memory = map->memory;
u8 kind, kind_inv, priv, ro, vol;
int kindn, aper, ret = -ENOSYS;
const u8 *kindm;
@@ -450,30 +441,8 @@ gp100_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
}
if (kindm[kind] != kind) {
- u64 tags = nvkm_memory_size(memory) >> 16;
- if (aper != 0 || !(page->type & NVKM_VMM_PAGE_COMP)) {
- VMM_DEBUG(vmm, "comp %d %02x", aper, page->type);
- return -EINVAL;
- }
-
- if (!map->no_comp) {
- ret = nvkm_memory_tags_get(memory, device, tags,
- nvkm_ltc_tags_clear,
- &map->tags);
- if (ret) {
- VMM_DEBUG(vmm, "comp %d", ret);
- return ret;
- }
- }
-
- if (!map->no_comp && map->tags->mn) {
- tags = map->tags->mn->offset + (map->offset >> 16);
- map->ctag |= ((1ULL << page->shift) >> 16) << 36;
- map->type |= tags << 36;
- map->next |= map->ctag;
- } else {
- kind = kindm[kind];
- }
+ /* Revert to non-compressed kind. */
+ kind = kindm[kind];
}
map->type |= BIT(0);
@@ -592,8 +561,8 @@ gp100_vmm = {
{ 47, &gp100_vmm_desc_16[4], NVKM_VMM_PAGE_Sxxx },
{ 38, &gp100_vmm_desc_16[3], NVKM_VMM_PAGE_Sxxx },
{ 29, &gp100_vmm_desc_16[2], NVKM_VMM_PAGE_Sxxx },
- { 21, &gp100_vmm_desc_16[1], NVKM_VMM_PAGE_SVxC },
- { 16, &gp100_vmm_desc_16[0], NVKM_VMM_PAGE_SVxC },
+ { 21, &gp100_vmm_desc_16[1], NVKM_VMM_PAGE_SVxx },
+ { 16, &gp100_vmm_desc_16[0], NVKM_VMM_PAGE_SVxx },
{ 12, &gp100_vmm_desc_12[0], NVKM_VMM_PAGE_SVHx },
{}
}
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp10b.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp10b.c
index e081239afe58..5791d134962b 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp10b.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp10b.c
@@ -34,8 +34,8 @@ gp10b_vmm = {
{ 47, &gp100_vmm_desc_16[4], NVKM_VMM_PAGE_Sxxx },
{ 38, &gp100_vmm_desc_16[3], NVKM_VMM_PAGE_Sxxx },
{ 29, &gp100_vmm_desc_16[2], NVKM_VMM_PAGE_Sxxx },
- { 21, &gp100_vmm_desc_16[1], NVKM_VMM_PAGE_SxHC },
- { 16, &gp100_vmm_desc_16[0], NVKM_VMM_PAGE_SxHC },
+ { 21, &gp100_vmm_desc_16[1], NVKM_VMM_PAGE_SxHx },
+ { 16, &gp100_vmm_desc_16[0], NVKM_VMM_PAGE_SxHx },
{ 12, &gp100_vmm_desc_12[0], NVKM_VMM_PAGE_SxHx },
{}
}
--
2.51.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 4/5] drm/nouveau/mmu/tu102: Add support for compressed kinds
2025-10-31 10:49 [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mohamed Ahmed
` (2 preceding siblings ...)
2025-10-31 10:49 ` [PATCH v4 3/5] drm/nouveau/mmu/gp100: Remove unused/broken support for compression Mohamed Ahmed
@ 2025-10-31 10:49 ` Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 5/5] drm/nouveau/drm: Bump the driver version to 1.4.1 to report new features Mohamed Ahmed
2025-10-31 14:18 ` [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mary Guillemard
5 siblings, 0 replies; 13+ messages in thread
From: Mohamed Ahmed @ 2025-10-31 10:49 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Mary Guillemard, Faith Ekstrand, Lyude Paul,
Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau,
Ben Skeggs, Mohamed Ahmed
From: Ben Skeggs <bskeggs@nvidia.com>
Allow compressed PTE kinds to be written into PTEs when GSP-RM is
present, rather than reverting to their non-compressed versions.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
---
.../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 46 ++++++++++++++++++-
1 file changed, 44 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
index ecff1096a1bb..ed15a4475181 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
@@ -109,12 +109,34 @@ gp100_vmm_pgt_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
nvkm_done(pt->memory);
}
+static inline u64
+gp100_vmm_comptag_nr(u64 size)
+{
+ return size >> 16; /* One comptag per 64KiB VRAM. */
+}
+
+static inline u64
+gp100_vmm_pte_comptagline_base(u64 addr)
+{
+ /* RM allocates enough comptags for all of VRAM, so use a 1:1 mapping. */
+ return (1 + gp100_vmm_comptag_nr(addr)) << 36; /* NV_MMU_VER2_PTE_COMPTAGLINE */
+}
+
+static inline u64
+gp100_vmm_pte_comptagline_incr(u32 page_size)
+{
+ return gp100_vmm_comptag_nr(page_size) << 36; /* NV_MMU_VER2_PTE_COMPTAGLINE */
+}
+
static inline void
gp100_vmm_pgt_pte(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
u32 ptei, u32 ptes, struct nvkm_vmm_map *map, u64 addr)
{
u64 data = (addr >> 4) | map->type;
+ if (map->ctag)
+ data |= gp100_vmm_pte_comptagline_base(addr);
+
while (ptes--) {
VMM_WO064(pt, vmm, ptei++ * 8, data);
data += map->next;
@@ -195,6 +217,9 @@ gp100_vmm_pd0_pte(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
{
u64 data = (addr >> 4) | map->type;
+ if (map->ctag)
+ data |= gp100_vmm_pte_comptagline_base(addr);
+
while (ptes--) {
VMM_WO128(pt, vmm, ptei++ * 0x10, data, 0ULL);
data += map->next;
@@ -440,9 +465,26 @@ gp100_vmm_valid(struct nvkm_vmm *vmm, void *argv, u32 argc,
return -EINVAL;
}
+ /* Handle compression. */
if (kindm[kind] != kind) {
- /* Revert to non-compressed kind. */
- kind = kindm[kind];
+ struct nvkm_device *device = vmm->mmu->subdev.device;
+
+ /* Compression is only supported when using GSP-RM, as
+ * PMU firmware is required in order to initialise the
+ * compbit backing store.
+ */
+ if (nvkm_gsp_rm(device->gsp)) {
+ /* Turing GPUs require PTE_COMPTAGLINE to be filled,
+ * in addition to specifying a compressed kind.
+ */
+ if (device->card_type < GA100) {
+ map->ctag = gp100_vmm_pte_comptagline_incr(1 << map->page->shift);
+ map->next |= map->ctag;
+ }
+ } else {
+ /* Revert to non-compressed kind. */
+ kind = kindm[kind];
+ }
}
map->type |= BIT(0);
--
2.51.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 5/5] drm/nouveau/drm: Bump the driver version to 1.4.1 to report new features
2025-10-31 10:49 [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mohamed Ahmed
` (3 preceding siblings ...)
2025-10-31 10:49 ` [PATCH v4 4/5] drm/nouveau/mmu/tu102: Add support for compressed kinds Mohamed Ahmed
@ 2025-10-31 10:49 ` Mohamed Ahmed
2025-10-31 14:18 ` [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mary Guillemard
5 siblings, 0 replies; 13+ messages in thread
From: Mohamed Ahmed @ 2025-10-31 10:49 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, Mary Guillemard, Faith Ekstrand, Lyude Paul,
Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau,
Mohamed Ahmed
The HW can only do compression on large and huge pages, and enabling it on
4K pages leads to a MMU fault. Compression also needs kernel support for
handling the compressed kinds and managing the compression tags.
This increments the nouveau version number which allows NVK to enable it
only when the kernel actually supports both features and avoid breaking
the system if a newer mesa version is paired with an older kernel version.
For the associated userspace MR, please see !36450:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36450
Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
---
drivers/gpu/drm/nouveau/nouveau_drv.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 55abc510067b..e5de4367e2cc 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -10,7 +10,7 @@
#define DRIVER_MAJOR 1
#define DRIVER_MINOR 4
-#define DRIVER_PATCHLEVEL 0
+#define DRIVER_PATCHLEVEL 1
/*
* 1.1.1:
@@ -35,6 +35,8 @@
* programs that get directly linked with NVKM.
* 1.3.1:
* - implemented limited ABI16/NVIF interop
+ * 1.4.1:
+ * - add variable page sizes and compression for Turing+
*/
#include <linux/notifier.h>
--
2.51.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression
2025-10-31 10:49 [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mohamed Ahmed
` (4 preceding siblings ...)
2025-10-31 10:49 ` [PATCH v4 5/5] drm/nouveau/drm: Bump the driver version to 1.4.1 to report new features Mohamed Ahmed
@ 2025-10-31 14:18 ` Mary Guillemard
5 siblings, 0 replies; 13+ messages in thread
From: Mary Guillemard @ 2025-10-31 14:18 UTC (permalink / raw)
To: Mohamed Ahmed
Cc: linux-kernel, dri-devel, Faith Ekstrand, Lyude Paul,
Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau
Series is working fine on older versions of NVK and with compression
patches on mesa side (tested on Ada and Ampere):
Tested-by: Mary Guillemard <mary@mary.zone>
Regards,
Mary
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 2/5] drm/nouveau/uvmm: Allow larger pages
2025-10-31 10:49 ` [PATCH v4 2/5] drm/nouveau/uvmm: Allow " Mohamed Ahmed
@ 2025-10-31 17:01 ` James Jones
2025-11-03 23:53 ` Mohamed Ahmed
2025-11-05 22:51 ` Danilo Krummrich
1 sibling, 1 reply; 13+ messages in thread
From: James Jones @ 2025-10-31 17:01 UTC (permalink / raw)
To: Mohamed Ahmed, linux-kernel
Cc: dri-devel, Mary Guillemard, Faith Ekstrand, Lyude Paul,
Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau
On 10/31/25 03:49, Mohamed Ahmed wrote:
> From: Mary Guillemard <mary@mary.zone>
>
> Now that everything in UVMM knows about the variable page shift, we can
> select larger values.
>
> The proposed approach relies on nouveau_bo::page unless if it would cause
> alignment issues (in which case we fall back to searching for an
> appropriate shift)
>
> Signed-off-by: Mary Guillemard <mary@mary.zone>
> Co-developed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
> Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
> ---
> drivers/gpu/drm/nouveau/nouveau_uvmm.c | 60 +++++++++++++++++++++++++-
> 1 file changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> index 2cd0835b05e8..ab8933b88337 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> @@ -454,6 +454,62 @@ op_unmap_prepare_unwind(struct drm_gpuva *va)
> drm_gpuva_insert(va->vm, va);
> }
>
> +static bool
> +op_map_aligned_to_page_shift(const struct drm_gpuva_op_map *op, u8 page_shift)
> +{
> + u64 non_page_bits = (1ULL << page_shift) - 1;
> +
> + return (op->va.addr & non_page_bits) == 0 &&
> + (op->va.range & non_page_bits) == 0 &&
> + (op->gem.offset & non_page_bits) == 0;
> +}
> +
> +static u8
> +select_page_shift(struct nouveau_uvmm *uvmm, struct drm_gpuva_op_map *op)
> +{
> + struct nouveau_bo *nvbo = nouveau_gem_object(op->gem.obj);
> +
> + /* nouveau_bo_fixup_align() guarantees that the page size will be aligned
> + * for most cases, but it can't handle cases where userspace allocates with
> + * a size and then binds with a smaller granularity. So in order to avoid
> + * breaking old userspace, we need to ensure that the VA is actually
> + * aligned before using it, and if it isn't, then we downgrade to the first
> + * granularity that will fit, which is optimal from a correctness and
> + * performance perspective.
> + */
> + if (op_map_aligned_to_page_shift(op, nvbo->page))
> + return nvbo->page;
> +
> + struct nouveau_mem *mem = nouveau_mem(nvbo->bo.resource);
> + struct nvif_vmm *vmm = &uvmm->vmm.vmm;
> + int i;
> +
> + /* If the given granularity doesn't fit, let's find one that will fit. */
> + for (i = 0; i < vmm->page_nr; i++) {
> + /* Ignore anything that is bigger or identical to the BO preference. */
> + if (vmm->page[i].shift >= nvbo->page)
> + continue;
> +
> + /* Skip incompatible domains. */
> + if ((mem->mem.type & NVIF_MEM_VRAM) && !vmm->page[i].vram)
> + continue;
> + if ((mem->mem.type & NVIF_MEM_HOST) &&
> + (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
> + continue;
This logic doesn't seem correct. I'm not sure why there's a need to
limit the page size on the host memory type, but assuming there is due
to nouveau architecture or HW limitations I'm not aware of, it should be
applied universally, not just when falling back due to misaligned
addresses. You can get lucky and have aligned addresses regardless of
the target page size. Hence, this check would need to precede the above
early-out for the case where op_map_aligned_to_page_shift() succeeds.
Thanks,
-James
> + /* If it fits, return the proposed shift. */
> + if (op_map_aligned_to_page_shift(op, vmm->page[i].shift))
> + return vmm->page[i].shift;
> + }
> +
> + /* If we get here then nothing can reconcile the requirements. This should never
> + * happen.
> + */
> + WARN_ON(1);
> +
> + return PAGE_SHIFT;
> +}
> +
> static void
> nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
> struct nouveau_uvma_prealloc *new,
> @@ -506,7 +562,7 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
> if (vmm_get_range)
> nouveau_uvmm_vmm_put(uvmm, vmm_get_start,
> vmm_get_range,
> - PAGE_SHIFT);
> + select_page_shift(uvmm, &op->map));
> break;
> }
> case DRM_GPUVA_OP_REMAP: {
> @@ -599,7 +655,7 @@ op_map_prepare(struct nouveau_uvmm *uvmm,
>
> uvma->region = args->region;
> uvma->kind = args->kind;
> - uvma->page_shift = PAGE_SHIFT;
> + uvma->page_shift = select_page_shift(uvmm, op);
>
> drm_gpuva_map(&uvmm->base, &uvma->va, op);
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 2/5] drm/nouveau/uvmm: Allow larger pages
2025-10-31 17:01 ` James Jones
@ 2025-11-03 23:53 ` Mohamed Ahmed
2025-11-04 1:12 ` James Jones
2025-11-05 22:50 ` Danilo Krummrich
0 siblings, 2 replies; 13+ messages in thread
From: Mohamed Ahmed @ 2025-11-03 23:53 UTC (permalink / raw)
To: James Jones
Cc: linux-kernel, dri-devel, Mary Guillemard, Faith Ekstrand,
Lyude Paul, Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau
Thanks a lot for the shout out! Looking more at things, the logic here
is actually redundant. It was originally copied over directly from the
bo allocation code to stay on the safer side (basically the idea back
then was to make both the bo and vmm sides match exactly). We aren't
at risk of having an aligned address that is in the wrong memory type
because the bo allocation code (nouveau_bo.c:321) forces anything that
has the GART flag to have a page size of 4K. Anything getting a page
size higher than that is exclusively VRAM only. Additionally,
currently things marked VRAM only don't get evicted to host memory
except under high memory pressure and in that case, the context is
paused until the objects in question are paged back in, so we also
don't have to worry about memory placement there.
The memory placement check in the vmm code could be removed but I am
leaning more towards leaving it as is just to stay on the safer side.
At the same time, it would be more useful to keep it for the future as
one of the future investigation targets that we want to look into is
all the memory placement rules because the "only 4K is allowed for
host memory" limit that nouveau imposes is a source of many pains in
userspace (originally thought to be a HW thing but seems it's actually
not), and having the checks on both bo and vmm paths would help
starting out with that.
Thanks a lot again,
Mohamed
On Fri, Oct 31, 2025 at 7:01 PM James Jones <jajones@nvidia.com> wrote:
>
> On 10/31/25 03:49, Mohamed Ahmed wrote:
> > From: Mary Guillemard <mary@mary.zone>
> >
> > Now that everything in UVMM knows about the variable page shift, we can
> > select larger values.
> >
> > The proposed approach relies on nouveau_bo::page unless if it would cause
> > alignment issues (in which case we fall back to searching for an
> > appropriate shift)
> >
> > Signed-off-by: Mary Guillemard <mary@mary.zone>
> > Co-developed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
> > Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
> > ---
> > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 60 +++++++++++++++++++++++++-
> > 1 file changed, 58 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > index 2cd0835b05e8..ab8933b88337 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > @@ -454,6 +454,62 @@ op_unmap_prepare_unwind(struct drm_gpuva *va)
> > drm_gpuva_insert(va->vm, va);
> > }
> >
> > +static bool
> > +op_map_aligned_to_page_shift(const struct drm_gpuva_op_map *op, u8 page_shift)
> > +{
> > + u64 non_page_bits = (1ULL << page_shift) - 1;
> > +
> > + return (op->va.addr & non_page_bits) == 0 &&
> > + (op->va.range & non_page_bits) == 0 &&
> > + (op->gem.offset & non_page_bits) == 0;
> > +}
> > +
> > +static u8
> > +select_page_shift(struct nouveau_uvmm *uvmm, struct drm_gpuva_op_map *op)
> > +{
> > + struct nouveau_bo *nvbo = nouveau_gem_object(op->gem.obj);
> > +
> > + /* nouveau_bo_fixup_align() guarantees that the page size will be aligned
> > + * for most cases, but it can't handle cases where userspace allocates with
> > + * a size and then binds with a smaller granularity. So in order to avoid
> > + * breaking old userspace, we need to ensure that the VA is actually
> > + * aligned before using it, and if it isn't, then we downgrade to the first
> > + * granularity that will fit, which is optimal from a correctness and
> > + * performance perspective.
> > + */
> > + if (op_map_aligned_to_page_shift(op, nvbo->page))
> > + return nvbo->page;
> > +
> > + struct nouveau_mem *mem = nouveau_mem(nvbo->bo.resource);
> > + struct nvif_vmm *vmm = &uvmm->vmm.vmm;
> > + int i;
> > +
> > + /* If the given granularity doesn't fit, let's find one that will fit. */
> > + for (i = 0; i < vmm->page_nr; i++) {
> > + /* Ignore anything that is bigger or identical to the BO preference. */
> > + if (vmm->page[i].shift >= nvbo->page)
> > + continue;
> > +
> > + /* Skip incompatible domains. */
> > + if ((mem->mem.type & NVIF_MEM_VRAM) && !vmm->page[i].vram)
> > + continue;
> > + if ((mem->mem.type & NVIF_MEM_HOST) &&
> > + (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
> > + continue;
>
> This logic doesn't seem correct. I'm not sure why there's a need to
> limit the page size on the host memory type, but assuming there is due
> to nouveau architecture or HW limitations I'm not aware of, it should be
> applied universally, not just when falling back due to misaligned
> addresses. You can get lucky and have aligned addresses regardless of
> the target page size. Hence, this check would need to precede the above
> early-out for the case where op_map_aligned_to_page_shift() succeeds.
>
> Thanks,
> -James
>
> > + /* If it fits, return the proposed shift. */
> > + if (op_map_aligned_to_page_shift(op, vmm->page[i].shift))
> > + return vmm->page[i].shift;
> > + }
> > +
> > + /* If we get here then nothing can reconcile the requirements. This should never
> > + * happen.
> > + */
> > + WARN_ON(1);
> > +
> > + return PAGE_SHIFT;
> > +}
> > +
> > static void
> > nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
> > struct nouveau_uvma_prealloc *new,
> > @@ -506,7 +562,7 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
> > if (vmm_get_range)
> > nouveau_uvmm_vmm_put(uvmm, vmm_get_start,
> > vmm_get_range,
> > - PAGE_SHIFT);
> > + select_page_shift(uvmm, &op->map));
> > break;
> > }
> > case DRM_GPUVA_OP_REMAP: {
> > @@ -599,7 +655,7 @@ op_map_prepare(struct nouveau_uvmm *uvmm,
> >
> > uvma->region = args->region;
> > uvma->kind = args->kind;
> > - uvma->page_shift = PAGE_SHIFT;
> > + uvma->page_shift = select_page_shift(uvmm, op);
> >
> > drm_gpuva_map(&uvmm->base, &uvma->va, op);
> >
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 2/5] drm/nouveau/uvmm: Allow larger pages
2025-11-03 23:53 ` Mohamed Ahmed
@ 2025-11-04 1:12 ` James Jones
2025-11-05 22:50 ` Danilo Krummrich
1 sibling, 0 replies; 13+ messages in thread
From: James Jones @ 2025-11-04 1:12 UTC (permalink / raw)
To: Mohamed Ahmed
Cc: linux-kernel, dri-devel, Mary Guillemard, Faith Ekstrand,
Lyude Paul, Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau
On 11/3/25 15:53, Mohamed Ahmed wrote:
> Thanks a lot for the shout out! Looking more at things, the logic here
> is actually redundant. It was originally copied over directly from the
> bo allocation code to stay on the safer side (basically the idea back
> then was to make both the bo and vmm sides match exactly). We aren't
> at risk of having an aligned address that is in the wrong memory type
> because the bo allocation code (nouveau_bo.c:321) forces anything that
> has the GART flag to have a page size of 4K. Anything getting a page
> size higher than that is exclusively VRAM only. Additionally,
> currently things marked VRAM only don't get evicted to host memory
> except under high memory pressure and in that case, the context is
> paused until the objects in question are paged back in, so we also
> don't have to worry about memory placement there.
>
> The memory placement check in the vmm code could be removed but I am
> leaning more towards leaving it as is just to stay on the safer side.
> At the same time, it would be more useful to keep it for the future as
> one of the future investigation targets that we want to look into is
> all the memory placement rules because the "only 4K is allowed for
> host memory" limit that nouveau imposes is a source of many pains in
> userspace (originally thought to be a HW thing but seems it's actually
> not), and having the checks on both bo and vmm paths would help
> starting out with that.
OK, thanks for the explanation. I'm fine with leaving the check as-is in
that case.
Given that, for the series:
Reviewed-by: James Jones <jajones@nvidia.com>
Thanks,
-James
> Thanks a lot again,
> Mohamed
>
> On Fri, Oct 31, 2025 at 7:01 PM James Jones <jajones@nvidia.com> wrote:
>>
>> On 10/31/25 03:49, Mohamed Ahmed wrote:
>>> From: Mary Guillemard <mary@mary.zone>
>>>
>>> Now that everything in UVMM knows about the variable page shift, we can
>>> select larger values.
>>>
>>> The proposed approach relies on nouveau_bo::page unless if it would cause
>>> alignment issues (in which case we fall back to searching for an
>>> appropriate shift)
>>>
>>> Signed-off-by: Mary Guillemard <mary@mary.zone>
>>> Co-developed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
>>> Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
>>> ---
>>> drivers/gpu/drm/nouveau/nouveau_uvmm.c | 60 +++++++++++++++++++++++++-
>>> 1 file changed, 58 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>> index 2cd0835b05e8..ab8933b88337 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>> @@ -454,6 +454,62 @@ op_unmap_prepare_unwind(struct drm_gpuva *va)
>>> drm_gpuva_insert(va->vm, va);
>>> }
>>>
>>> +static bool
>>> +op_map_aligned_to_page_shift(const struct drm_gpuva_op_map *op, u8 page_shift)
>>> +{
>>> + u64 non_page_bits = (1ULL << page_shift) - 1;
>>> +
>>> + return (op->va.addr & non_page_bits) == 0 &&
>>> + (op->va.range & non_page_bits) == 0 &&
>>> + (op->gem.offset & non_page_bits) == 0;
>>> +}
>>> +
>>> +static u8
>>> +select_page_shift(struct nouveau_uvmm *uvmm, struct drm_gpuva_op_map *op)
>>> +{
>>> + struct nouveau_bo *nvbo = nouveau_gem_object(op->gem.obj);
>>> +
>>> + /* nouveau_bo_fixup_align() guarantees that the page size will be aligned
>>> + * for most cases, but it can't handle cases where userspace allocates with
>>> + * a size and then binds with a smaller granularity. So in order to avoid
>>> + * breaking old userspace, we need to ensure that the VA is actually
>>> + * aligned before using it, and if it isn't, then we downgrade to the first
>>> + * granularity that will fit, which is optimal from a correctness and
>>> + * performance perspective.
>>> + */
>>> + if (op_map_aligned_to_page_shift(op, nvbo->page))
>>> + return nvbo->page;
>>> +
>>> + struct nouveau_mem *mem = nouveau_mem(nvbo->bo.resource);
>>> + struct nvif_vmm *vmm = &uvmm->vmm.vmm;
>>> + int i;
>>> +
>>> + /* If the given granularity doesn't fit, let's find one that will fit. */
>>> + for (i = 0; i < vmm->page_nr; i++) {
>>> + /* Ignore anything that is bigger or identical to the BO preference. */
>>> + if (vmm->page[i].shift >= nvbo->page)
>>> + continue;
>>> +
>>> + /* Skip incompatible domains. */
>>> + if ((mem->mem.type & NVIF_MEM_VRAM) && !vmm->page[i].vram)
>>> + continue;
>>> + if ((mem->mem.type & NVIF_MEM_HOST) &&
>>> + (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
>>> + continue;
>>
>> This logic doesn't seem correct. I'm not sure why there's a need to
>> limit the page size on the host memory type, but assuming there is due
>> to nouveau architecture or HW limitations I'm not aware of, it should be
>> applied universally, not just when falling back due to misaligned
>> addresses. You can get lucky and have aligned addresses regardless of
>> the target page size. Hence, this check would need to precede the above
>> early-out for the case where op_map_aligned_to_page_shift() succeeds.
>>
>> Thanks,
>> -James
>>
>>> + /* If it fits, return the proposed shift. */
>>> + if (op_map_aligned_to_page_shift(op, vmm->page[i].shift))
>>> + return vmm->page[i].shift;
>>> + }
>>> +
>>> + /* If we get here then nothing can reconcile the requirements. This should never
>>> + * happen.
>>> + */
>>> + WARN_ON(1);
>>> +
>>> + return PAGE_SHIFT;
>>> +}
>>> +
>>> static void
>>> nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
>>> struct nouveau_uvma_prealloc *new,
>>> @@ -506,7 +562,7 @@ nouveau_uvmm_sm_prepare_unwind(struct nouveau_uvmm *uvmm,
>>> if (vmm_get_range)
>>> nouveau_uvmm_vmm_put(uvmm, vmm_get_start,
>>> vmm_get_range,
>>> - PAGE_SHIFT);
>>> + select_page_shift(uvmm, &op->map));
>>> break;
>>> }
>>> case DRM_GPUVA_OP_REMAP: {
>>> @@ -599,7 +655,7 @@ op_map_prepare(struct nouveau_uvmm *uvmm,
>>>
>>> uvma->region = args->region;
>>> uvma->kind = args->kind;
>>> - uvma->page_shift = PAGE_SHIFT;
>>> + uvma->page_shift = select_page_shift(uvmm, op);
>>>
>>> drm_gpuva_map(&uvmm->base, &uvma->va, op);
>>>
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 2/5] drm/nouveau/uvmm: Allow larger pages
2025-11-03 23:53 ` Mohamed Ahmed
2025-11-04 1:12 ` James Jones
@ 2025-11-05 22:50 ` Danilo Krummrich
2025-11-08 19:30 ` Mary Guillemard
1 sibling, 1 reply; 13+ messages in thread
From: Danilo Krummrich @ 2025-11-05 22:50 UTC (permalink / raw)
To: Mohamed Ahmed
Cc: James Jones, linux-kernel, dri-devel, Mary Guillemard,
Faith Ekstrand, Lyude Paul, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau
On 11/4/25 12:53 AM, Mohamed Ahmed wrote:
> Thanks a lot for the shout out! Looking more at things, the logic here
> is actually redundant. It was originally copied over directly from the
> bo allocation code to stay on the safer side (basically the idea back
> then was to make both the bo and vmm sides match exactly). We aren't
> at risk of having an aligned address that is in the wrong memory type
> because the bo allocation code (nouveau_bo.c:321) forces anything that
> has the GART flag to have a page size of 4K. Anything getting a page
> size higher than that is exclusively VRAM only. Additionally,
> currently things marked VRAM only don't get evicted to host memory
> except under high memory pressure and in that case, the context is
> paused until the objects in question are paged back in, so we also
> don't have to worry about memory placement there.
>
> The memory placement check in the vmm code could be removed but I am
> leaning more towards leaving it as is just to stay on the safer side.
If it is not necessary, please remove it. We should not carry dead code.
> At the same time, it would be more useful to keep it for the future as
> one of the future investigation targets that we want to look into is
> all the memory placement rules because the "only 4K is allowed for
> host memory" limit that nouveau imposes is a source of many pains in
> userspace (originally thought to be a HW thing but seems it's actually
> not), and having the checks on both bo and vmm paths would help
> starting out with that.
Please don't top-post, see also [1].
[1] https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 2/5] drm/nouveau/uvmm: Allow larger pages
2025-10-31 10:49 ` [PATCH v4 2/5] drm/nouveau/uvmm: Allow " Mohamed Ahmed
2025-10-31 17:01 ` James Jones
@ 2025-11-05 22:51 ` Danilo Krummrich
1 sibling, 0 replies; 13+ messages in thread
From: Danilo Krummrich @ 2025-11-05 22:51 UTC (permalink / raw)
To: Mohamed Ahmed
Cc: linux-kernel, dri-devel, Mary Guillemard, Faith Ekstrand,
Lyude Paul, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, nouveau
On 10/31/25 11:49 AM, Mohamed Ahmed wrote:
> + /* If we get here then nothing can reconcile the requirements. This should never
> + * happen.
> + */
> + WARN_ON(1);
This is called from a userspace path, please use dev_warn_once() instead and
return an error code.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 2/5] drm/nouveau/uvmm: Allow larger pages
2025-11-05 22:50 ` Danilo Krummrich
@ 2025-11-08 19:30 ` Mary Guillemard
0 siblings, 0 replies; 13+ messages in thread
From: Mary Guillemard @ 2025-11-08 19:30 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Mohamed Ahmed, James Jones, linux-kernel, dri-devel,
Faith Ekstrand, Lyude Paul, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, nouveau
Hi,
On Wed, Nov 5, 2025 at 11:50 PM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On 11/4/25 12:53 AM, Mohamed Ahmed wrote:
> > Thanks a lot for the shout out! Looking more at things, the logic here
> > is actually redundant. It was originally copied over directly from the
> > bo allocation code to stay on the safer side (basically the idea back
> > then was to make both the bo and vmm sides match exactly). We aren't
> > at risk of having an aligned address that is in the wrong memory type
> > because the bo allocation code (nouveau_bo.c:321) forces anything that
> > has the GART flag to have a page size of 4K. Anything getting a page
> > size higher than that is exclusively VRAM only. Additionally,
> > currently things marked VRAM only don't get evicted to host memory
> > except under high memory pressure and in that case, the context is
> > paused until the objects in question are paged back in, so we also
> > don't have to worry about memory placement there.
> >
> > The memory placement check in the vmm code could be removed but I am
> > leaning more towards leaving it as is just to stay on the safer side.
>
> If it is not necessary, please remove it. We should not carry dead code.
>
For correctness, this code path needs to refuse incompatible domains
to decide the appropriate page size.
As such those checks should remain.
Regards,
Mary
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-11-08 19:30 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 10:49 [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 1/5] drm/nouveau/uvmm: Prepare for larger pages Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 2/5] drm/nouveau/uvmm: Allow " Mohamed Ahmed
2025-10-31 17:01 ` James Jones
2025-11-03 23:53 ` Mohamed Ahmed
2025-11-04 1:12 ` James Jones
2025-11-05 22:50 ` Danilo Krummrich
2025-11-08 19:30 ` Mary Guillemard
2025-11-05 22:51 ` Danilo Krummrich
2025-10-31 10:49 ` [PATCH v4 3/5] drm/nouveau/mmu/gp100: Remove unused/broken support for compression Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 4/5] drm/nouveau/mmu/tu102: Add support for compressed kinds Mohamed Ahmed
2025-10-31 10:49 ` [PATCH v4 5/5] drm/nouveau/drm: Bump the driver version to 1.4.1 to report new features Mohamed Ahmed
2025-10-31 14:18 ` [PATCH v4 0/5] drm/nouveau: Enable variable page sizes and compression Mary Guillemard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox