* Re: [PATCH 04/61] ext4: Prefer IS_ERR_OR_NULL over manual NULL check
From: Theodore Ts'o @ 2026-04-10 15:18 UTC (permalink / raw)
To: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs, Philipp Hahn
Cc: Theodore Ts'o, Andreas Dilger
In-Reply-To: <20260310-b4-is_err_or_null-v1-4-bd63b656022d@avm.de>
On Tue, 10 Mar 2026 12:48:30 +0100, Philipp Hahn wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
>
> Change generated with coccinelle.
Applied, thanks!
[04/61] ext4: Prefer IS_ERR_OR_NULL over manual NULL check
commit: 1d749e110277ce4103f27bd60d6181e52c0cc1e3
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply
* [PATCH v12 16/16] KVM: selftests: Test guest execution from direct map removed gmem
From: Kalyazin, Nikita @ 2026-04-10 15:20 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Add a selftest that loads itself into guest_memfd (via
GUEST_MEMFD_FLAG_MMAP) and triggers an MMIO exit when executed. This
exercises x86 MMIO emulation code inside KVM for guest_memfd-backed
memslots where the guest_memfd folios are direct map removed.
Particularly, it validates that x86 MMIO emulation code (guest page
table walks + instruction fetch) correctly accesses gmem through the VMA
that's been reflected into the memslot's userspace_addr field (instead
of trying to do direct map accesses).
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
.../selftests/kvm/set_memory_region_test.c | 52 +++++++++++++++++--
1 file changed, 48 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index 7fe427ff9b38..cb445d420e8c 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -602,6 +602,41 @@ static void test_mmio_during_vectoring(void)
kvm_vm_free(vm);
}
+
+static void guest_code_trigger_mmio(void)
+{
+ /*
+ * Read some GPA that is not backed by a memslot. KVM consider this
+ * as MMIO and tell userspace to emulate the read.
+ */
+ READ_ONCE(*((uint64_t *)MEM_REGION_GPA));
+
+ GUEST_DONE();
+}
+
+static void test_guest_memfd_mmio(void)
+{
+ struct kvm_vm *vm;
+ struct kvm_vcpu *vcpu;
+ struct vm_shape shape = {
+ .mode = VM_MODE_DEFAULT,
+ .src_type = VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP,
+ };
+ pthread_t vcpu_thread;
+
+ pr_info("Testing MMIO emulation for instructions in gmem\n");
+
+ vm = __vm_create_shape_with_one_vcpu(shape, &vcpu, 0, guest_code_trigger_mmio);
+
+ virt_map(vm, MEM_REGION_GPA, MEM_REGION_GPA, 1);
+
+ pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
+
+ /* If the MMIO read was successfully emulated, the vcpu thread will exit */
+ pthread_join(vcpu_thread, NULL);
+
+ kvm_vm_free(vm);
+}
#endif
int main(int argc, char *argv[])
@@ -625,10 +660,19 @@ int main(int argc, char *argv[])
test_add_max_memory_regions();
#ifdef __x86_64__
- if (kvm_has_cap(KVM_CAP_GUEST_MEMFD) &&
- (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))) {
- test_add_private_memory_region();
- test_add_overlapping_private_memory_regions();
+ if (kvm_has_cap(KVM_CAP_GUEST_MEMFD)) {
+ uint64_t valid_flags = kvm_check_cap(KVM_CAP_GUEST_MEMFD_FLAGS);
+
+ if (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM)) {
+ test_add_private_memory_region();
+ test_add_overlapping_private_memory_regions();
+ }
+
+ if ((valid_flags & GUEST_MEMFD_FLAG_MMAP) &&
+ (valid_flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP))
+ test_guest_memfd_mmio();
+ else
+ pr_info("Skipping tests requiring GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_NO_DIRECT_MAP");
} else {
pr_info("Skipping tests for KVM_MEM_GUEST_MEMFD memory regions\n");
}
--
2.50.1
^ permalink raw reply related
* [PATCH v12 15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape
From: Kalyazin, Nikita @ 2026-04-10 15:20 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Use one of the padding fields in struct vm_shape to carry an enum
vm_mem_backing_src_type value, to give the option to overwrite the
default of VM_MEM_SRC_ANONYMOUS in __vm_create().
Overwriting this default will allow tests to create VMs where the test
code is backed by mmap'd guest_memfd instead of anonymous memory.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
.../testing/selftests/kvm/include/kvm_util.h | 19 ++++++++++---------
tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
tools/testing/selftests/kvm/lib/x86/sev.c | 1 +
.../selftests/kvm/pre_fault_memory_test.c | 1 +
4 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 056a003a63c0..48b6ee8223aa 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -215,7 +215,7 @@ enum vm_guest_mode {
struct vm_shape {
uint32_t type;
uint8_t mode;
- uint8_t pad0;
+ uint8_t src_type;
uint16_t pad1;
};
@@ -223,14 +223,15 @@ kvm_static_assert(sizeof(struct vm_shape) == sizeof(uint64_t));
#define VM_TYPE_DEFAULT 0
-#define VM_SHAPE(__mode) \
-({ \
- struct vm_shape shape = { \
- .mode = (__mode), \
- .type = VM_TYPE_DEFAULT \
- }; \
- \
- shape; \
+#define VM_SHAPE(__mode) \
+({ \
+ struct vm_shape shape = { \
+ .mode = (__mode), \
+ .type = VM_TYPE_DEFAULT, \
+ .src_type = VM_MEM_SRC_ANONYMOUS \
+ }; \
+ \
+ shape; \
})
extern enum vm_guest_mode vm_mode_default;
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index fa4a2fc236fe..824c94c64864 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -500,7 +500,7 @@ struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus,
if (is_guest_memfd_required(shape))
flags |= KVM_MEM_GUEST_MEMFD;
- vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, flags);
+ vm_userspace_mem_region_add(vm, shape.src_type, 0, 0, nr_pages, flags);
for (i = 0; i < NR_MEM_REGIONS; i++)
vm->memslots[i] = 0;
diff --git a/tools/testing/selftests/kvm/lib/x86/sev.c b/tools/testing/selftests/kvm/lib/x86/sev.c
index c3a9838f4806..d920880e4fc0 100644
--- a/tools/testing/selftests/kvm/lib/x86/sev.c
+++ b/tools/testing/selftests/kvm/lib/x86/sev.c
@@ -164,6 +164,7 @@ struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t type, void *guest_code,
struct vm_shape shape = {
.mode = VM_MODE_DEFAULT,
.type = type,
+ .src_type = VM_MEM_SRC_ANONYMOUS,
};
struct kvm_vm *vm;
struct kvm_vcpu *cpus[1];
diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index 93e603d91311..8a4d5af53fab 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -165,6 +165,7 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
const struct vm_shape shape = {
.mode = VM_MODE_DEFAULT,
.type = vm_type,
+ .src_type = VM_MEM_SRC_ANONYMOUS,
};
struct kvm_vcpu *vcpu;
struct kvm_run *run;
--
2.50.1
^ permalink raw reply related
* [PATCH v12 14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests
From: Kalyazin, Nikita @ 2026-04-10 15:20 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Extend mem conversion selftests to cover the scenario that the guest can
fault in and write gmem-backed guest memory even if its direct map
removed. Also cover the new flag in guest_memfd_test.c tests.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
tools/testing/selftests/kvm/guest_memfd_test.c | 17 ++++++++++++++++-
.../kvm/x86/private_mem_conversions_test.c | 7 ++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index cc329b57ce2e..64c1200c182e 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -403,6 +403,17 @@ static void test_guest_memfd(unsigned long vm_type)
__test_guest_memfd(vm, GUEST_MEMFD_FLAG_MMAP |
GUEST_MEMFD_FLAG_INIT_SHARED);
+ if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) {
+ __test_guest_memfd(vm, GUEST_MEMFD_FLAG_NO_DIRECT_MAP);
+ if (flags & GUEST_MEMFD_FLAG_MMAP)
+ __test_guest_memfd(vm, GUEST_MEMFD_FLAG_NO_DIRECT_MAP |
+ GUEST_MEMFD_FLAG_MMAP);
+ if (flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+ __test_guest_memfd(vm, GUEST_MEMFD_FLAG_NO_DIRECT_MAP |
+ GUEST_MEMFD_FLAG_MMAP |
+ GUEST_MEMFD_FLAG_INIT_SHARED);
+ }
+
kvm_vm_free(vm);
}
@@ -445,10 +456,14 @@ static void test_guest_memfd_guest(void)
TEST_ASSERT(vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS) & GUEST_MEMFD_FLAG_INIT_SHARED,
"Default VM type should support INIT_SHARED, supported flags = 0x%x",
vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS));
+ TEST_ASSERT(vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS) & GUEST_MEMFD_FLAG_NO_DIRECT_MAP,
+ "Default VM type should support NO_DIRECT_MAP, supported flags = 0x%x",
+ vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS));
size = vm->page_size;
fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP |
- GUEST_MEMFD_FLAG_INIT_SHARED);
+ GUEST_MEMFD_FLAG_INIT_SHARED |
+ GUEST_MEMFD_FLAG_NO_DIRECT_MAP);
vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL, fd, 0);
mem = kvm_mmap(size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
index 1969f4ab9b28..8767cb4a037e 100644
--- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
+++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
@@ -367,7 +367,7 @@ static void *__test_mem_conversions(void *__vcpu)
}
static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t nr_vcpus,
- uint32_t nr_memslots)
+ uint32_t nr_memslots, uint64_t gmem_flags)
{
/*
* Allocate enough memory so that each vCPU's chunk of memory can be
@@ -394,7 +394,7 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t
vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE));
- memfd = vm_create_guest_memfd(vm, memfd_size, 0);
+ memfd = vm_create_guest_memfd(vm, memfd_size, gmem_flags);
for (i = 0; i < nr_memslots; i++)
vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i,
@@ -474,7 +474,8 @@ int main(int argc, char *argv[])
}
}
- test_mem_conversions(src_type, nr_vcpus, nr_memslots);
+ test_mem_conversions(src_type, nr_vcpus, nr_memslots, 0);
+ test_mem_conversions(src_type, nr_vcpus, nr_memslots, GUEST_MEMFD_FLAG_NO_DIRECT_MAP);
return 0;
}
--
2.50.1
^ permalink raw reply related
* [PATCH v12 13/16] KVM: selftests: Add guest_memfd based vm_mem_backing_src_types
From: Kalyazin, Nikita @ 2026-04-10 15:20 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Allow selftests to configure their memslots such that userspace_addr is
set to a MAP_SHARED mapping of the guest_memfd that's associated with
the memslot. This setup is the configuration for non-CoCo VMs, where all
guest memory is backed by a guest_memfd whose folios are all marked
shared, but KVM is still able to access guest memory to provide
functionality such as MMIO emulation on x86.
Add backing types for normal guest_memfd, as well as direct map removed
guest_memfd.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
.../testing/selftests/kvm/include/kvm_util.h | 18 ++++++
.../testing/selftests/kvm/include/test_util.h | 7 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 61 ++++++++++---------
tools/testing/selftests/kvm/lib/test_util.c | 8 +++
4 files changed, 65 insertions(+), 29 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 8b39cb919f4f..056a003a63c0 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -664,6 +664,24 @@ static inline bool is_smt_on(void)
void vm_create_irqchip(struct kvm_vm *vm);
+static inline uint32_t backing_src_guest_memfd_flags(enum vm_mem_backing_src_type t)
+{
+ uint32_t flags = 0;
+
+ switch (t) {
+ case VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP:
+ flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+ fallthrough;
+ case VM_MEM_SRC_GUEST_MEMFD:
+ flags |= GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
+ break;
+ default:
+ break;
+ }
+
+ return flags;
+}
+
static inline int __vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size,
uint64_t flags)
{
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index 8140e59b59e5..ea6de20ce8ef 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -152,6 +152,8 @@ enum vm_mem_backing_src_type {
VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB,
VM_MEM_SRC_SHMEM,
VM_MEM_SRC_SHARED_HUGETLB,
+ VM_MEM_SRC_GUEST_MEMFD,
+ VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP,
NUM_SRC_TYPES,
};
@@ -184,6 +186,11 @@ static inline bool backing_src_is_shared(enum vm_mem_backing_src_type t)
return vm_mem_backing_src_alias(t)->flag & MAP_SHARED;
}
+static inline bool backing_src_is_guest_memfd(enum vm_mem_backing_src_type t)
+{
+ return t == VM_MEM_SRC_GUEST_MEMFD || t == VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP;
+}
+
static inline bool backing_src_can_be_huge(enum vm_mem_backing_src_type t)
{
return t != VM_MEM_SRC_ANONYMOUS && t != VM_MEM_SRC_SHMEM;
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 5b0865683047..fa4a2fc236fe 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1046,6 +1046,33 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
alignment = 1;
#endif
+ if (guest_memfd < 0) {
+ if ((flags & KVM_MEM_GUEST_MEMFD) || backing_src_is_guest_memfd(src_type)) {
+ uint32_t guest_memfd_flags = backing_src_guest_memfd_flags(src_type);
+
+ TEST_ASSERT(!guest_memfd_offset,
+ "Offset must be zero when creating new guest_memfd");
+ guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
+ }
+ } else {
+ /*
+ * Install a unique fd for each memslot so that the fd
+ * can be closed when the region is deleted without
+ * needing to track if the fd is owned by the framework
+ * or by the caller.
+ */
+ guest_memfd = kvm_dup(guest_memfd);
+ }
+
+ if (guest_memfd >= 0) {
+ flags |= KVM_MEM_GUEST_MEMFD;
+
+ region->region.guest_memfd = guest_memfd;
+ region->region.guest_memfd_offset = guest_memfd_offset;
+ } else {
+ region->region.guest_memfd = -1;
+ }
+
/*
* When using THP mmap is not guaranteed to returned a hugepage aligned
* address so we have to pad the mmap. Padding is not needed for HugeTLB
@@ -1061,10 +1088,13 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
if (alignment > 1)
region->mmap_size += alignment;
- region->fd = -1;
- if (backing_src_is_shared(src_type))
+ if (backing_src_is_guest_memfd(src_type))
+ region->fd = guest_memfd;
+ else if (backing_src_is_shared(src_type))
region->fd = kvm_memfd_alloc(region->mmap_size,
src_type == VM_MEM_SRC_SHARED_HUGETLB);
+ else
+ region->fd = -1;
region->mmap_start = kvm_mmap(region->mmap_size, PROT_READ | PROT_WRITE,
vm_mem_backing_src_alias(src_type)->flag,
@@ -1089,33 +1119,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
}
region->backing_src_type = src_type;
-
- if (guest_memfd < 0) {
- if (flags & KVM_MEM_GUEST_MEMFD) {
- uint32_t guest_memfd_flags = 0;
- TEST_ASSERT(!guest_memfd_offset,
- "Offset must be zero when creating new guest_memfd");
- guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
- }
- } else {
- /*
- * Install a unique fd for each memslot so that the fd
- * can be closed when the region is deleted without
- * needing to track if the fd is owned by the framework
- * or by the caller.
- */
- guest_memfd = kvm_dup(guest_memfd);
- }
-
- if (guest_memfd >= 0) {
- flags |= KVM_MEM_GUEST_MEMFD;
-
- region->region.guest_memfd = guest_memfd;
- region->region.guest_memfd_offset = guest_memfd_offset;
- } else {
- region->region.guest_memfd = -1;
- }
-
region->unused_phy_pages = sparsebit_alloc();
if (vm_arch_has_protected_memory(vm))
region->protected_phy_pages = sparsebit_alloc();
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
index 8a1848586a85..ce9fe0271515 100644
--- a/tools/testing/selftests/kvm/lib/test_util.c
+++ b/tools/testing/selftests/kvm/lib/test_util.c
@@ -306,6 +306,14 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i)
*/
.flag = MAP_SHARED,
},
+ [VM_MEM_SRC_GUEST_MEMFD] = {
+ .name = "guest_memfd",
+ .flag = MAP_SHARED,
+ },
+ [VM_MEM_SRC_GUEST_MEMFD_NO_DIRECT_MAP] = {
+ .name = "guest_memfd_no_direct_map",
+ .flag = MAP_SHARED,
+ }
};
_Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES,
"Missing new backing src types?");
--
2.50.1
^ permalink raw reply related
* [PATCH v12 12/16] KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Have vm_mem_add() always set KVM_MEM_GUEST_MEMFD in the memslot flags if
a guest_memfd is passed in as an argument. This eliminates the
possibility where a guest_memfd instance is passed to vm_mem_add(), but
it ends up being ignored because the flags argument does not specify
KVM_MEM_GUEST_MEMFD at the same time.
This makes it easy to support more scenarios in which no vm_mem_add() is
not passed a guest_memfd instance, but is expected to allocate one.
Currently, this only happens if guest_memfd == -1 but flags &
KVM_MEM_GUEST_MEMFD != 0, but later vm_mem_add() will gain support for
loading the test code itself into guest_memfd (via
GUEST_MEMFD_FLAG_MMAP) if requested via a special
vm_mem_backing_src_type, at which point having to make sure the src_type
and flags are in-sync becomes cumbersome.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 24 +++++++++++++---------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 1959bf556e88..5b0865683047 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1090,21 +1090,25 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
region->backing_src_type = src_type;
- if (flags & KVM_MEM_GUEST_MEMFD) {
- if (guest_memfd < 0) {
+ if (guest_memfd < 0) {
+ if (flags & KVM_MEM_GUEST_MEMFD) {
uint32_t guest_memfd_flags = 0;
TEST_ASSERT(!guest_memfd_offset,
"Offset must be zero when creating new guest_memfd");
guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
- } else {
- /*
- * Install a unique fd for each memslot so that the fd
- * can be closed when the region is deleted without
- * needing to track if the fd is owned by the framework
- * or by the caller.
- */
- guest_memfd = kvm_dup(guest_memfd);
}
+ } else {
+ /*
+ * Install a unique fd for each memslot so that the fd
+ * can be closed when the region is deleted without
+ * needing to track if the fd is owned by the framework
+ * or by the caller.
+ */
+ guest_memfd = kvm_dup(guest_memfd);
+ }
+
+ if (guest_memfd >= 0) {
+ flags |= KVM_MEM_GUEST_MEMFD;
region->region.guest_memfd = guest_memfd;
region->region.guest_memfd_offset = guest_memfd_offset;
--
2.50.1
^ permalink raw reply related
* [PATCH v12 11/16] KVM: selftests: load elf via bounce buffer
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
If guest memory is backed using a VMA that does not allow GUP (e.g. a
userspace mapping of guest_memfd when the fd was allocated using
GUEST_MEMFD_FLAG_NO_DIRECT_MAP), then directly loading the test ELF
binary into it via read(2) potentially does not work. To nevertheless
support loading binaries in this cases, do the read(2) syscall using a
bounce buffer, and then memcpy from the bounce buffer into guest memory.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
.../testing/selftests/kvm/include/test_util.h | 1 +
tools/testing/selftests/kvm/lib/elf.c | 8 +++----
tools/testing/selftests/kvm/lib/io.c | 23 +++++++++++++++++++
3 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index b4872ba8ed12..8140e59b59e5 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -48,6 +48,7 @@ do { \
ssize_t test_write(int fd, const void *buf, size_t count);
ssize_t test_read(int fd, void *buf, size_t count);
+ssize_t test_read_bounce(int fd, void *buf, size_t count);
int test_seq_read(const char *path, char **bufp, size_t *sizep);
void __printf(5, 6) test_assert(bool exp, const char *exp_str,
diff --git a/tools/testing/selftests/kvm/lib/elf.c b/tools/testing/selftests/kvm/lib/elf.c
index f34d926d9735..e829fbe0a11e 100644
--- a/tools/testing/selftests/kvm/lib/elf.c
+++ b/tools/testing/selftests/kvm/lib/elf.c
@@ -31,7 +31,7 @@ static void elfhdr_get(const char *filename, Elf64_Ehdr *hdrp)
* the real size of the ELF header.
*/
unsigned char ident[EI_NIDENT];
- test_read(fd, ident, sizeof(ident));
+ test_read_bounce(fd, ident, sizeof(ident));
TEST_ASSERT((ident[EI_MAG0] == ELFMAG0) && (ident[EI_MAG1] == ELFMAG1)
&& (ident[EI_MAG2] == ELFMAG2) && (ident[EI_MAG3] == ELFMAG3),
"ELF MAGIC Mismatch,\n"
@@ -79,7 +79,7 @@ static void elfhdr_get(const char *filename, Elf64_Ehdr *hdrp)
offset_rv = lseek(fd, 0, SEEK_SET);
TEST_ASSERT(offset_rv == 0, "Seek to ELF header failed,\n"
" rv: %zi expected: %i", offset_rv, 0);
- test_read(fd, hdrp, sizeof(*hdrp));
+ test_read_bounce(fd, hdrp, sizeof(*hdrp));
TEST_ASSERT(hdrp->e_phentsize == sizeof(Elf64_Phdr),
"Unexpected physical header size,\n"
" hdrp->e_phentsize: %x\n"
@@ -146,7 +146,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
/* Read in the program header. */
Elf64_Phdr phdr;
- test_read(fd, &phdr, sizeof(phdr));
+ test_read_bounce(fd, &phdr, sizeof(phdr));
/* Skip if this header doesn't describe a loadable segment. */
if (phdr.p_type != PT_LOAD)
@@ -187,7 +187,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
" expected: 0x%jx",
n1, errno, (intmax_t) offset_rv,
(intmax_t) phdr.p_offset);
- test_read(fd, addr_gva2hva(vm, phdr.p_vaddr),
+ test_read_bounce(fd, addr_gva2hva(vm, phdr.p_vaddr),
phdr.p_filesz);
}
}
diff --git a/tools/testing/selftests/kvm/lib/io.c b/tools/testing/selftests/kvm/lib/io.c
index fedb2a741f0b..60613dce6cfd 100644
--- a/tools/testing/selftests/kvm/lib/io.c
+++ b/tools/testing/selftests/kvm/lib/io.c
@@ -155,3 +155,26 @@ ssize_t test_read(int fd, void *buf, size_t count)
return num_read;
}
+
+/* Test read via intermediary buffer
+ *
+ * Same as test_read, except read(2)s happen into a bounce buffer that is memcpy'd
+ * to buf. For use with buffers that cannot be GUP'd (e.g. guest_memfd VMAs if
+ * guest_memfd was created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP).
+ */
+ssize_t test_read_bounce(int fd, void *buf, size_t count)
+{
+ void *bounce_buffer;
+ ssize_t num_read;
+
+ TEST_ASSERT(count > 0, "Unexpected count, count: %zu", count);
+
+ bounce_buffer = malloc(count);
+ TEST_ASSERT(bounce_buffer != NULL, "Failed to allocate bounce buffer");
+
+ num_read = test_read(fd, bounce_buffer, count);
+ memcpy(buf, bounce_buffer, num_read);
+ free(bounce_buffer);
+
+ return num_read;
+}
--
2.50.1
^ permalink raw reply related
* [PATCH v12 10/16] KVM: guest_memfd: Add flag to remove from direct map
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita, Nikita Kalyazin
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD()
ioctl. When set, guest_memfd folios will be removed from the direct map
after preparation, with direct map entries only restored when the folios
are freed.
To ensure these folios do not end up in places where the kernel cannot
deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested.
Note that this flag causes removal of direct map entries for all
guest_memfd folios independent of whether they are "shared" or "private"
(although current guest_memfd only supports either all folios in the
"shared" state, or all folios in the "private" state if
GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map
entries of also the shared parts of guest_memfd are a special type of
non-CoCo VM where, host userspace is trusted to have access to all of
guest memory, but where Spectre-style transient execution attacks
through the host kernel's direct map should still be mitigated. In this
setup, KVM retains access to guest memory via userspace mappings of
guest_memfd, which are reflected back into KVM's memslots via
userspace_addr. This is needed for things like MMIO emulation on x86_64
to work.
Direct map entries are zapped right before guest or userspace mappings
of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or
kvm_gmem_get_pfn() [called from the KVM MMU code]. At present, direct
map removal is not supported on platforms that support
kvm_gmem_populate(). In case such support is added in the future, the
following ordering is maintained: zap then prepare, invalidate then
restore, to avoid having guest-owned pages being temporarily mapped on
by host. This assumes that preparation or invalidation code does not
access the page content.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Co-developed-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
Documentation/virt/kvm/api.rst | 21 +++++-----
include/linux/kvm_host.h | 3 ++
include/uapi/linux/kvm.h | 1 +
virt/kvm/guest_memfd.c | 71 ++++++++++++++++++++++++++++++++--
4 files changed, 83 insertions(+), 13 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 032516783e96..8feec77b03fe 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6439,15 +6439,18 @@ a single guest_memfd file, but the bound ranges must not overlap).
The capability KVM_CAP_GUEST_MEMFD_FLAGS enumerates the `flags` that can be
specified via KVM_CREATE_GUEST_MEMFD. Currently defined flags:
- ============================ ================================================
- GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
- descriptor.
- GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
- KVM_CREATE_GUEST_MEMFD (memory files created
- without INIT_SHARED will be marked private).
- Shared memory can be faulted into host userspace
- page tables. Private memory cannot.
- ============================ ================================================
+ ============================== ================================================
+ GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
+ descriptor.
+ GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during
+ KVM_CREATE_GUEST_MEMFD (memory files created
+ without INIT_SHARED will be marked private).
+ Shared memory can be faulted into host userspace
+ page tables. Private memory cannot.
+ GUEST_MEMFD_FLAG_NO_DIRECT_MAP The guest_memfd instance will unmap the memory
+ backing it from the kernel's address space
+ before passing it off to userspace or the guest.
+ ============================== ================================================
When the KVM MMU performs a PFN lookup to service a guest fault and the backing
guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ce8c5fdf2752..c95747e2278c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -738,6 +738,9 @@ static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
if (!kvm || kvm_arch_supports_gmem_init_shared(kvm))
flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
+ if (!kvm || kvm_arch_gmem_supports_no_direct_map(kvm))
+ flags |= GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+
return flags;
}
#endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 80364d4dbebb..d864f67efdb7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1642,6 +1642,7 @@ struct kvm_memory_attributes {
#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0)
#define GUEST_MEMFD_FLAG_INIT_SHARED (1ULL << 1)
+#define GUEST_MEMFD_FLAG_NO_DIRECT_MAP (1ULL << 2)
struct kvm_create_guest_memfd {
__u64 size;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 651649623448..80d4a6aca128 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,6 +7,7 @@
#include <linux/mempolicy.h>
#include <linux/pseudo_fs.h>
#include <linux/pagemap.h>
+#include <linux/set_memory.h>
#include "kvm_mm.h"
@@ -76,6 +77,39 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo
return 0;
}
+#define KVM_GMEM_FOLIO_NO_DIRECT_MAP BIT(0)
+
+static bool kvm_gmem_folio_no_direct_map(struct folio *folio)
+{
+ return ((u64)folio->private) & KVM_GMEM_FOLIO_NO_DIRECT_MAP;
+}
+
+static int kvm_gmem_folio_zap_direct_map(struct folio *folio)
+{
+ int r = 0;
+
+ VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+
+ if (WARN_ON_ONCE(!(GMEM_I(folio_inode(folio))->flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)))
+ return -EINVAL;
+
+ if (kvm_gmem_folio_no_direct_map(folio))
+ goto out;
+
+ r = folio_zap_direct_map(folio);
+ if (!r)
+ folio->private = (void *)((u64)folio->private | KVM_GMEM_FOLIO_NO_DIRECT_MAP);
+
+out:
+ return r;
+}
+
+static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
+{
+ folio_restore_direct_map(folio);
+ folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP);
+}
+
/*
* Process @folio, which contains @gfn, so that the guest can use it.
* The folio must be locked and the gfn must be contained in @slot.
@@ -388,11 +422,17 @@ static bool kvm_gmem_supports_mmap(struct inode *inode)
return GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_MMAP;
}
+static bool kvm_gmem_no_direct_map(struct inode *inode)
+{
+ return GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP;
+}
+
static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
{
struct inode *inode = file_inode(vmf->vma->vm_file);
struct folio *folio;
vm_fault_t ret = VM_FAULT_LOCKED;
+ int err;
if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
return VM_FAULT_SIGBUS;
@@ -418,6 +458,14 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
folio_mark_uptodate(folio);
}
+ if (kvm_gmem_no_direct_map(folio_inode(folio))) {
+ err = kvm_gmem_folio_zap_direct_map(folio);
+ if (err) {
+ ret = vmf_error(err);
+ goto out_folio;
+ }
+ }
+
vmf->page = folio_file_page(folio, vmf->pgoff);
out_folio:
@@ -529,6 +577,9 @@ static void kvm_gmem_free_folio(struct folio *folio)
int order = folio_order(folio);
kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
+
+ if (kvm_gmem_folio_no_direct_map(folio))
+ kvm_gmem_folio_restore_direct_map(folio);
}
static const struct address_space_operations kvm_gmem_aops = {
@@ -591,6 +642,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
/* Unmovable mappings are supposed to be marked unevictable as well. */
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
+ if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)
+ mapping_set_no_direct_map(inode->i_mapping);
+
GMEM_I(inode)->flags = flags;
file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
@@ -802,14 +856,23 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
folio_mark_uptodate(folio);
}
+ if (kvm_gmem_no_direct_map(folio_inode(folio))) {
+ r = kvm_gmem_folio_zap_direct_map(folio);
+ if (r)
+ goto out_unlock;
+ }
+
r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
+ if (r)
+ goto out_unlock;
+ *page = folio_file_page(folio, index);
folio_unlock(folio);
+ return 0;
- if (!r)
- *page = folio_file_page(folio, index);
- else
- folio_put(folio);
+out_unlock:
+ folio_unlock(folio);
+ folio_put(folio);
return r;
}
--
2.50.1
^ permalink raw reply related
* [PATCH v12 09/16] KVM: arm64: define kvm_arch_gmem_supports_no_direct_map()
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Support for GUEST_MEMFD_FLAG_NO_DIRECT_MAP on arm64 depends on 1) direct
map manipulations at 4k granularity being possible, and 2) FEAT_S2FWB.
1) is met whenever the direct map is set up at 4k granularity (e.g. not
with huge/gigantic pages) at boottime, as due to ARM's
break-before-make semantics, breaking huge mappings into 4k mappings in
the direct map is not possible (BBM would require temporary invalidation
of the entire huge mapping, even if only a 4k subrange should be zapped,
which will probably crash the kernel). However, the current default for
rodata_full is true, which forces a 4k direct map.
2) is required to allow KVM to elide cache coherency operations when
installing stage 2 page tables, which require the direct map to be
entry for the newly mapped memory to be present (which it will not be,
as guest_memfd would have removed direct map entries in
kvm_gmem_get_pfn()).
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 70cb9cfd760a..fbdd43e7e94e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -19,6 +19,7 @@
#include <linux/maple_tree.h>
#include <linux/percpu.h>
#include <linux/psci.h>
+#include <linux/set_memory.h>
#include <asm/arch_gicv3.h>
#include <asm/barrier.h>
#include <asm/cpufeature.h>
@@ -1682,6 +1683,18 @@ static __always_inline enum fgt_group_id __fgt_reg_to_group_id(enum vcpu_sysreg
\
p; \
})
+#ifdef CONFIG_KVM_GUEST_MEMFD
+static inline bool kvm_arch_gmem_supports_no_direct_map(struct kvm *kvm)
+{
+ /*
+ * Without FWB, direct map access is needed in kvm_pgtable_stage2_map(),
+ * as it calls dcache_clean_inval_poc().
+ */
+ return can_set_direct_map() && cpus_have_final_cap(ARM64_HAS_STAGE2_FWB);
+}
+#define kvm_arch_gmem_supports_no_direct_map kvm_arch_gmem_supports_no_direct_map
+#endif /* CONFIG_KVM_GUEST_MEMFD */
+
long kvm_get_cap_for_kvm_ioctl(unsigned int ioctl, long *ext);
--
2.50.1
^ permalink raw reply related
* [PATCH v12 08/16] KVM: x86: define kvm_arch_gmem_supports_no_direct_map()
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita, Nikita Kalyazin
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
x86 supports GUEST_MEMFD_FLAG_NO_DIRECT_MAP whenever direct map
modifications are possible. Exclude TDX and SEV-SNP as they access
pages via direct map in certain operations, such as population.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Co-developed-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/x86.c | 7 +++++++
include/linux/kvm_host.h | 9 +++++++++
3 files changed, 22 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6e4e3ef9b8c7..171ce8b84137 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -28,6 +28,7 @@
#include <linux/sched/vhost_task.h>
#include <linux/call_once.h>
#include <linux/atomic.h>
+#include <linux/set_memory.h>
#include <asm/apic.h>
#include <asm/pvclock-abi.h>
@@ -2504,4 +2505,9 @@ static inline bool kvm_arch_has_irq_bypass(void)
return enable_device_posted_irqs;
}
+#ifdef CONFIG_KVM_GUEST_MEMFD
+bool kvm_arch_gmem_supports_no_direct_map(struct kvm *kvm);
+#define kvm_arch_gmem_supports_no_direct_map kvm_arch_gmem_supports_no_direct_map
+#endif /* CONFIG_KVM_GUEST_MEMFD */
+
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd1c4a36b593..32da7820823c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -14079,6 +14079,13 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
kvm_x86_call(gmem_invalidate)(start, end);
}
#endif
+
+bool kvm_arch_gmem_supports_no_direct_map(struct kvm *kvm)
+{
+ return can_set_direct_map() &&
+ kvm->arch.vm_type != KVM_X86_TDX_VM &&
+ kvm->arch.vm_type != KVM_X86_SNP_VM;
+}
#endif
int kvm_spec_ctrl_test_value(u64 value)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e8aa3d676c31..ce8c5fdf2752 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -742,6 +742,15 @@ static inline u64 kvm_gmem_get_supported_flags(struct kvm *kvm)
}
#endif
+#ifdef CONFIG_KVM_GUEST_MEMFD
+#ifndef kvm_arch_gmem_supports_no_direct_map
+static inline bool kvm_arch_gmem_supports_no_direct_map(struct kvm *kvm)
+{
+ return false;
+}
+#endif
+#endif /* CONFIG_KVM_GUEST_MEMFD */
+
#ifndef kvm_arch_has_readonly_mem
static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm)
{
--
2.50.1
^ permalink raw reply related
* [PATCH v12 07/16] KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate
From: Kalyazin, Nikita @ 2026-04-10 15:19 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita, Vlastimil Babka
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Add a no-op stub for kvm_arch_gmem_invalidate if
CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE=n. This allows defining
kvm_gmem_free_folio without ifdef-ery, which allows more cleanly using
guest_memfd's free_folio callback for non-arch-invalidation related
code.
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
include/linux/kvm_host.h | 2 ++
virt/kvm/guest_memfd.c | 4 ----
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6b76e7a6f4c2..e8aa3d676c31 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2587,6 +2587,8 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages
#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+#else
+static inline void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) { }
#endif
#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 017d84a7adf3..651649623448 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -522,7 +522,6 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
return MF_DELAYED;
}
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
static void kvm_gmem_free_folio(struct folio *folio)
{
struct page *page = folio_page(folio, 0);
@@ -531,15 +530,12 @@ static void kvm_gmem_free_folio(struct folio *folio)
kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
}
-#endif
static const struct address_space_operations kvm_gmem_aops = {
.dirty_folio = noop_dirty_folio,
.migrate_folio = kvm_gmem_migrate_folio,
.error_remove_folio = kvm_gmem_error_folio,
-#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
.free_folio = kvm_gmem_free_folio,
-#endif
};
static int kvm_gmem_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
--
2.50.1
^ permalink raw reply related
* [PATCH v12 06/16] mm: introduce AS_NO_DIRECT_MAP
From: Kalyazin, Nikita @ 2026-04-10 15:18 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita, Vlastimil Babka
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are
set to not present. Currently, mappings that match this description are
secretmem mappings (memfd_secret()). Later, some guest_memfd
configurations will also fall into this category.
Reject this new type of mappings in all locations that currently reject
secretmem mappings, on the assumption that if secretmem mappings are
rejected somewhere, it is precisely because of an inability to deal with
folios without direct map entries, and then make memfd_secret() use
AS_NO_DIRECT_MAP on its address_space to drop its special
vma_is_secretmem()/secretmem_mapping() checks.
Use a new flag instead of overloading AS_INACCESSIBLE (which is already
set by guest_memfd) because not all guest_memfd mappings will end up
being direct map removed (e.g. in pKVM setups, parts of guest_memfd that
can be mapped to userspace should also be GUP-able, and generally not
have restrictions on who can access it).
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
include/linux/pagemap.h | 16 ++++++++++++++++
include/linux/secretmem.h | 18 ------------------
lib/buildid.c | 8 ++++++--
mm/gup.c | 9 ++++-----
mm/mlock.c | 2 +-
mm/secretmem.c | 8 ++------
6 files changed, 29 insertions(+), 32 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ec442af3f886..68c075502d91 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -211,6 +211,7 @@ enum mapping_flags {
AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
account usage to user cgroups */
AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
+ AS_NO_DIRECT_MAP = 12, /* Folios in the mapping are not in the direct map */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
@@ -356,6 +357,21 @@ static inline bool mapping_no_data_integrity(const struct address_space *mapping
return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
}
+static inline void mapping_set_no_direct_map(struct address_space *mapping)
+{
+ set_bit(AS_NO_DIRECT_MAP, &mapping->flags);
+}
+
+static inline bool mapping_no_direct_map(const struct address_space *mapping)
+{
+ return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);
+}
+
+static inline bool vma_has_no_direct_map(const struct vm_area_struct *vma)
+{
+ return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);
+}
+
static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
{
return mapping->gfp_mask;
diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index e918f96881f5..0ae1fb057b3d 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -4,28 +4,10 @@
#ifdef CONFIG_SECRETMEM
-extern const struct address_space_operations secretmem_aops;
-
-static inline bool secretmem_mapping(struct address_space *mapping)
-{
- return mapping->a_ops == &secretmem_aops;
-}
-
-bool vma_is_secretmem(struct vm_area_struct *vma);
bool secretmem_active(void);
#else
-static inline bool vma_is_secretmem(struct vm_area_struct *vma)
-{
- return false;
-}
-
-static inline bool secretmem_mapping(struct address_space *mapping)
-{
- return false;
-}
-
static inline bool secretmem_active(void)
{
return false;
diff --git a/lib/buildid.c b/lib/buildid.c
index c4b737640621..ba79bf28f7e6 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -47,6 +47,10 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
freader_put_folio(r);
+ /* reject folios without direct map entries (e.g. from memfd_secret() or guest_memfd()) */
+ if (mapping_no_direct_map(r->file->f_mapping))
+ return -EFAULT;
+
/* only use page cache lookup - fail if not already cached */
r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT);
@@ -87,8 +91,8 @@ const void *freader_fetch(struct freader *r, loff_t file_off, size_t sz)
return r->data + file_off;
}
- /* reject secretmem folios created with memfd_secret() */
- if (secretmem_mapping(r->file->f_mapping)) {
+ /* reject folios without direct map entries (e.g. from memfd_secret() or guest_memfd()) */
+ if (mapping_no_direct_map(r->file->f_mapping)) {
r->err = -EFAULT;
return NULL;
}
diff --git a/mm/gup.c b/mm/gup.c
index 41eb64783e03..c1b4fb1eaee7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -11,7 +11,6 @@
#include <linux/rmap.h>
#include <linux/swap.h>
#include <linux/swapops.h>
-#include <linux/secretmem.h>
#include <linux/sched/signal.h>
#include <linux/rwsem.h>
@@ -1216,7 +1215,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))
return -EOPNOTSUPP;
- if (vma_is_secretmem(vma))
+ if (vma_has_no_direct_map(vma))
return -EFAULT;
if (write) {
@@ -2724,7 +2723,7 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
* This call assumes the caller has pinned the folio, that the lowest page table
* level still points to this folio, and that interrupts have been disabled.
*
- * GUP-fast must reject all secretmem folios.
+ * GUP-fast must reject all folios without direct map entries (such as secretmem).
*
* Writing to pinned file-backed dirty tracked folios is inherently problematic
* (see comment describing the writable_file_mapping_allowed() function). We
@@ -2744,7 +2743,7 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
if (WARN_ON_ONCE(folio_test_slab(folio)))
return false;
- /* hugetlb neither requires dirty-tracking nor can be secretmem. */
+ /* hugetlb neither requires dirty-tracking nor can be without direct map. */
if (folio_test_hugetlb(folio))
return true;
@@ -2786,7 +2785,7 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
* At this point, we know the mapping is non-null and points to an
* address_space object.
*/
- if (secretmem_mapping(mapping))
+ if (mapping_no_direct_map(mapping))
return false;
/*
diff --git a/mm/mlock.c b/mm/mlock.c
index 2f699c3497a5..a6f4b3df4f3f 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -474,7 +474,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (newflags == oldflags || (oldflags & VM_SPECIAL) ||
is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) ||
- vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE))
+ vma_is_dax(vma) || vma_has_no_direct_map(vma) || (oldflags & VM_DROPPABLE))
/* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
goto out;
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 27b176af8fc4..d32e1be1eb35 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -129,11 +129,6 @@ static int secretmem_mmap_prepare(struct vm_area_desc *desc)
return 0;
}
-bool vma_is_secretmem(struct vm_area_struct *vma)
-{
- return vma->vm_ops == &secretmem_vm_ops;
-}
-
static const struct file_operations secretmem_fops = {
.release = secretmem_release,
.mmap_prepare = secretmem_mmap_prepare,
@@ -151,7 +146,7 @@ static void secretmem_free_folio(struct folio *folio)
folio_zero_segment(folio, 0, folio_size(folio));
}
-const struct address_space_operations secretmem_aops = {
+static const struct address_space_operations secretmem_aops = {
.dirty_folio = noop_dirty_folio,
.free_folio = secretmem_free_folio,
.migrate_folio = secretmem_migrate_folio,
@@ -200,6 +195,7 @@ static struct file *secretmem_file_create(unsigned long flags)
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
mapping_set_unevictable(inode->i_mapping);
+ mapping_set_no_direct_map(inode->i_mapping);
inode->i_op = &secretmem_iops;
inode->i_mapping->a_ops = &secretmem_aops;
--
2.50.1
^ permalink raw reply related
* [PATCH v12 05/16] mm/gup: drop local variable in gup_fast_folio_allowed
From: Kalyazin, Nikita @ 2026-04-10 15:18 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Nikita Kalyazin <nikita.kalyazin@linux.dev>
Move the check for pinning closer to where the result is used.
No functional changes.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
mm/gup.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index e8367564d636..41eb64783e03 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2737,18 +2737,9 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
*/
static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
{
- bool reject_file_backed = false;
struct address_space *mapping;
unsigned long mapping_flags;
- /*
- * If we aren't pinning then no problematic write can occur. A long term
- * pin is the most egregious case so this is the one we disallow.
- */
- if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) ==
- (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
- reject_file_backed = true;
-
/* We hold a folio reference, so we can safely access folio fields. */
if (WARN_ON_ONCE(folio_test_slab(folio)))
return false;
@@ -2797,8 +2788,18 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
*/
if (secretmem_mapping(mapping))
return false;
- /* The only remaining allowed file system is shmem. */
- return !reject_file_backed || shmem_mapping(mapping);
+
+ /*
+ * If we aren't pinning then no problematic write can occur. A writable
+ * long term pin is the most egregious case, so this is the one we
+ * allow only for ...
+ */
+ if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) !=
+ (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
+ return true;
+
+ /* ... hugetlb (which we allowed above already) and shared memory. */
+ return shmem_mapping(mapping);
}
#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
--
2.50.1
^ permalink raw reply related
* [PATCH v12 04/16] mm/gup: drop secretmem optimization from gup_fast_folio_allowed
From: Kalyazin, Nikita @ 2026-04-10 15:18 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita, Vlastimil Babka
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Patrick Roy <patrick.roy@linux.dev>
This drops an optimization in gup_fast_folio_allowed() where
secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is
enabled by default since commit b758fe6df50d ("mm/secretmem: make it on
by default"), so the secretmem check did not actually end up elided in
most cases anymore anyway.
To make sure the fast path for ZONE_DEVICE pages (like Device DAX and
PCI P2PDMA) is still allowed, check for folio_is_zone_device() if
mapping is NULL.
This is in preparation of the generalization of handling mappings where
direct map entries of folios are set to not present. Currently,
mappings that match this description are secretmem mappings
(memfd_secret()). Later, some guest_memfd configurations will also fall
into this category.
Signed-off-by: Patrick Roy <patrick.roy@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
mm/gup.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 8e7dc2c6ee73..e8367564d636 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2739,7 +2739,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
{
bool reject_file_backed = false;
struct address_space *mapping;
- bool check_secretmem = false;
unsigned long mapping_flags;
/*
@@ -2751,14 +2750,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
reject_file_backed = true;
/* We hold a folio reference, so we can safely access folio fields. */
-
- /* secretmem folios are always order-0 folios. */
- if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))
- check_secretmem = true;
-
- if (!reject_file_backed && !check_secretmem)
- return true;
-
if (WARN_ON_ONCE(folio_test_slab(folio)))
return false;
@@ -2787,9 +2778,13 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
* The mapping may have been truncated, in any case we cannot determine
* if this mapping is safe - fall back to slow path to determine how to
* proceed.
+ *
+ * ZONE_DEVICE folios (e.g. Device DAX, PCI P2PDMA) may legitimately
+ * have a NULL mapping. They are never secretmem/no-direct-map folios,
+ * so let them through.
*/
if (!mapping)
- return false;
+ return folio_is_zone_device(folio);
/* Anonymous folios pose no problem. */
mapping_flags = (unsigned long)mapping & FOLIO_MAPPING_FLAGS;
@@ -2800,7 +2795,7 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
* At this point, we know the mapping is non-null and points to an
* address_space object.
*/
- if (check_secretmem && secretmem_mapping(mapping))
+ if (secretmem_mapping(mapping))
return false;
/* The only remaining allowed file system is shmem. */
return !reject_file_backed || shmem_mapping(mapping);
--
2.50.1
^ permalink raw reply related
* [PATCH v12 03/16] mm/secretmem: make use of folio_{zap,restore}_direct_map
From: Kalyazin, Nikita @ 2026-04-10 15:18 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Nikita Kalyazin <nikita.kalyazin@linux.dev>
Replace set_direct_map_*_noflush with newly available
folio_zap_direct_map calls that take folio's address internally. A side
effect is even if filemap_add_folio fails, the TLB is still flushed,
which is not expected to be on the hot path.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
mm/secretmem.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/secretmem.c b/mm/secretmem.c
index fd29b33c6764..27b176af8fc4 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -53,7 +53,6 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
struct inode *inode = file_inode(vmf->vma->vm_file);
pgoff_t offset = vmf->pgoff;
gfp_t gfp = vmf->gfp_mask;
- unsigned long addr;
struct folio *folio;
vm_fault_t ret;
int err;
@@ -72,7 +71,7 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
goto out;
}
- err = set_direct_map_invalid_noflush(folio_address(folio));
+ err = folio_zap_direct_map(folio);
if (err) {
folio_put(folio);
ret = vmf_error(err);
@@ -87,7 +86,7 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
* already happened when we marked the page invalid
* which guarantees that this call won't fail
*/
- set_direct_map_default_noflush(folio_address(folio));
+ folio_restore_direct_map(folio);
folio_put(folio);
if (err == -EEXIST)
goto retry;
@@ -95,9 +94,6 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
ret = vmf_error(err);
goto out;
}
-
- addr = (unsigned long)folio_address(folio);
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
}
vmf->page = folio_file_page(folio, vmf->pgoff);
--
2.50.1
^ permalink raw reply related
* [PATCH v12 02/16] set_memory: add folio_{zap,restore}_direct_map helpers
From: Kalyazin, Nikita @ 2026-04-10 15:18 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Nikita Kalyazin <nikita.kalyazin@linux.dev>
Let's provide folio_{zap,restore}_direct_map helpers as preparation for
supporting removal of the direct map for guest_memfd folios.
In folio_zap_direct_map(), flush TLB to make sure the data is not
accessible. On some architectures, there may be a double TLB flush
issued because set_direct_map_valid_noflush already performs a flush
internally.
The new helpers need to be accessible to KVM on architectures that
support guest_memfd (x86 and arm64).
Direct map removal gives guest_memfd the same protection that
memfd_secret does, such as hardening against Spectre-like attacks
through in-kernel gadgets.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
include/linux/set_memory.h | 13 +++++++++++
mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++++
2 files changed, 58 insertions(+)
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 1a2563f525fc..24caea2931f9 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -41,6 +41,15 @@ static inline int set_direct_map_valid_noflush(const void *addr,
return 0;
}
+static inline int folio_zap_direct_map(struct folio *folio)
+{
+ return 0;
+}
+
+static inline void folio_restore_direct_map(struct folio *folio)
+{
+}
+
static inline bool kernel_page_present(struct page *page)
{
return true;
@@ -57,6 +66,10 @@ static inline bool can_set_direct_map(void)
}
#define can_set_direct_map can_set_direct_map
#endif
+
+int folio_zap_direct_map(struct folio *folio);
+void folio_restore_direct_map(struct folio *folio);
+
#endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
#ifdef CONFIG_X86_64
diff --git a/mm/memory.c b/mm/memory.c
index 2f815a34d924..3b9ada2cc19c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -78,6 +78,7 @@
#include <linux/sched/sysctl.h>
#include <linux/pgalloc.h>
#include <linux/uaccess.h>
+#include <linux/set_memory.h>
#include <trace/events/kmem.h>
@@ -7479,3 +7480,47 @@ void vma_pgtable_walk_end(struct vm_area_struct *vma)
if (is_vm_hugetlb_page(vma))
hugetlb_vma_unlock_read(vma);
}
+
+#ifdef CONFIG_ARCH_HAS_SET_DIRECT_MAP
+/**
+ * folio_zap_direct_map - remove a folio from the kernel direct map
+ * @folio: folio to remove from the direct map
+ *
+ * Removes the folio from the kernel direct map and flushes the TLB. This may
+ * require splitting huge pages in the direct map, which can fail due to memory
+ * allocation. So far, only order-0 folios are supported.
+ *
+ * Return: 0 on success, or a negative error code on failure.
+ */
+int folio_zap_direct_map(struct folio *folio)
+{
+ const void *addr = folio_address(folio);
+ int ret;
+
+ if (folio_test_large(folio))
+ return -EINVAL;
+
+ ret = set_direct_map_valid_noflush(addr, folio_nr_pages(folio), false);
+ flush_tlb_kernel_range((unsigned long)addr,
+ (unsigned long)addr + folio_size(folio));
+
+ return ret;
+}
+EXPORT_SYMBOL_FOR_MODULES(folio_zap_direct_map, "kvm");
+
+/**
+ * folio_restore_direct_map - restore the kernel direct map entry for a folio
+ * @folio: folio whose direct map entry is to be restored
+ *
+ * This may only be called after a prior successful folio_zap_direct_map() on
+ * the same folio. Because the zap will have already split any huge pages in
+ * the direct map, restoration here only updates protection bits and cannot
+ * fail.
+ */
+void folio_restore_direct_map(struct folio *folio)
+{
+ WARN_ON_ONCE(set_direct_map_valid_noflush(folio_address(folio),
+ folio_nr_pages(folio), true));
+}
+EXPORT_SYMBOL_FOR_MODULES(folio_restore_direct_map, "kvm");
+#endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
--
2.50.1
^ permalink raw reply related
* [PATCH v12 01/16] set_memory: set_direct_map_* to take address
From: Kalyazin, Nikita @ 2026-04-10 15:17 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita
In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com>
From: Nikita Kalyazin <nikita.kalyazin@linux.dev>
Let's convert set_direct_map_*() to take an address instead of a page to
prepare for adding helpers that operate on folios; it will be more
efficient to convert from a folio directly to an address without going
through a page first.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nikita Kalyazin <nikita.kalyazin@linux.dev>
---
arch/arm64/include/asm/set_memory.h | 7 ++++---
arch/arm64/mm/pageattr.c | 19 +++++++++--------
arch/loongarch/include/asm/set_memory.h | 7 ++++---
arch/loongarch/mm/pageattr.c | 25 ++++++++++-------------
arch/riscv/include/asm/set_memory.h | 7 ++++---
arch/riscv/mm/pageattr.c | 17 ++++++++--------
arch/s390/include/asm/set_memory.h | 7 ++++---
arch/s390/mm/pageattr.c | 13 ++++++------
arch/x86/include/asm/set_memory.h | 7 ++++---
arch/x86/mm/pat/set_memory.c | 27 +++++++++++++------------
include/linux/set_memory.h | 9 +++++----
kernel/power/snapshot.c | 4 ++--
mm/execmem.c | 6 ++++--
mm/secretmem.c | 6 +++---
mm/vmalloc.c | 11 ++++++----
15 files changed, 91 insertions(+), 81 deletions(-)
diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/include/asm/set_memory.h
index 90f61b17275e..c71a2a6812c4 100644
--- a/arch/arm64/include/asm/set_memory.h
+++ b/arch/arm64/include/asm/set_memory.h
@@ -11,9 +11,10 @@ bool can_set_direct_map(void);
int set_memory_valid(unsigned long addr, int numpages, int enable);
-int set_direct_map_invalid_noflush(struct page *page);
-int set_direct_map_default_noflush(struct page *page);
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
+int set_direct_map_invalid_noflush(const void *addr);
+int set_direct_map_default_noflush(const void *addr);
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid);
bool kernel_page_present(struct page *page);
int set_memory_encrypted(unsigned long addr, int numpages);
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 358d1dc9a576..5aff94e1f8b2 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -245,7 +245,7 @@ int set_memory_valid(unsigned long addr, int numpages, int enable)
__pgprot(PTE_VALID));
}
-int set_direct_map_invalid_noflush(struct page *page)
+int set_direct_map_invalid_noflush(const void *addr)
{
pgprot_t clear_mask = __pgprot(PTE_VALID);
pgprot_t set_mask = __pgprot(0);
@@ -253,11 +253,11 @@ int set_direct_map_invalid_noflush(struct page *page)
if (!can_set_direct_map())
return 0;
- return update_range_prot((unsigned long)page_address(page),
- PAGE_SIZE, set_mask, clear_mask);
+ return update_range_prot((unsigned long)addr, PAGE_SIZE, set_mask,
+ clear_mask);
}
-int set_direct_map_default_noflush(struct page *page)
+int set_direct_map_default_noflush(const void *addr)
{
pgprot_t set_mask = __pgprot(PTE_VALID | PTE_WRITE);
pgprot_t clear_mask = __pgprot(PTE_RDONLY);
@@ -265,8 +265,8 @@ int set_direct_map_default_noflush(struct page *page)
if (!can_set_direct_map())
return 0;
- return update_range_prot((unsigned long)page_address(page),
- PAGE_SIZE, set_mask, clear_mask);
+ return update_range_prot((unsigned long)addr, PAGE_SIZE, set_mask,
+ clear_mask);
}
static int __set_memory_enc_dec(unsigned long addr,
@@ -349,14 +349,13 @@ int realm_register_memory_enc_ops(void)
return arm64_mem_crypt_ops_register(&realm_crypt_ops);
}
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid)
{
- unsigned long addr = (unsigned long)page_address(page);
-
if (!can_set_direct_map())
return 0;
- return set_memory_valid(addr, nr, valid);
+ return set_memory_valid((unsigned long)addr, numpages, valid);
}
#ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/arch/loongarch/include/asm/set_memory.h b/arch/loongarch/include/asm/set_memory.h
index 55dfaefd02c8..5e9b67b2fea1 100644
--- a/arch/loongarch/include/asm/set_memory.h
+++ b/arch/loongarch/include/asm/set_memory.h
@@ -15,8 +15,9 @@ int set_memory_ro(unsigned long addr, int numpages);
int set_memory_rw(unsigned long addr, int numpages);
bool kernel_page_present(struct page *page);
-int set_direct_map_default_noflush(struct page *page);
-int set_direct_map_invalid_noflush(struct page *page);
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
+int set_direct_map_invalid_noflush(const void *addr);
+int set_direct_map_default_noflush(const void *addr);
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid);
#endif /* _ASM_LOONGARCH_SET_MEMORY_H */
diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c
index f5e910b68229..9e08905d3624 100644
--- a/arch/loongarch/mm/pageattr.c
+++ b/arch/loongarch/mm/pageattr.c
@@ -198,32 +198,29 @@ bool kernel_page_present(struct page *page)
return pte_present(ptep_get(pte));
}
-int set_direct_map_default_noflush(struct page *page)
+int set_direct_map_default_noflush(const void *addr)
{
- unsigned long addr = (unsigned long)page_address(page);
-
- if (addr < vm_map_base)
+ if ((unsigned long)addr < vm_map_base)
return 0;
- return __set_memory(addr, 1, PAGE_KERNEL, __pgprot(0));
+ return __set_memory((unsigned long)addr, 1, PAGE_KERNEL, __pgprot(0));
}
-int set_direct_map_invalid_noflush(struct page *page)
+int set_direct_map_invalid_noflush(const void *addr)
{
- unsigned long addr = (unsigned long)page_address(page);
-
- if (addr < vm_map_base)
+ if ((unsigned long)addr < vm_map_base)
return 0;
- return __set_memory(addr, 1, __pgprot(0), __pgprot(_PAGE_PRESENT | _PAGE_VALID));
+ return __set_memory((unsigned long)addr, 1, __pgprot(0),
+ __pgprot(_PAGE_PRESENT | _PAGE_VALID));
}
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid)
{
- unsigned long addr = (unsigned long)page_address(page);
pgprot_t set, clear;
- if (addr < vm_map_base)
+ if ((unsigned long)addr < vm_map_base)
return 0;
if (valid) {
@@ -234,5 +231,5 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
clear = __pgprot(_PAGE_PRESENT | _PAGE_VALID);
}
- return __set_memory(addr, 1, set, clear);
+ return __set_memory((unsigned long)addr, 1, set, clear);
}
diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h
index 87389e93325a..a87eabd7fc78 100644
--- a/arch/riscv/include/asm/set_memory.h
+++ b/arch/riscv/include/asm/set_memory.h
@@ -40,9 +40,10 @@ static inline int set_kernel_memory(char *startp, char *endp,
}
#endif
-int set_direct_map_invalid_noflush(struct page *page);
-int set_direct_map_default_noflush(struct page *page);
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
+int set_direct_map_invalid_noflush(const void *addr);
+int set_direct_map_default_noflush(const void *addr);
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid);
bool kernel_page_present(struct page *page);
#endif /* __ASSEMBLER__ */
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 3f76db3d2769..0a457177a88c 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -374,19 +374,20 @@ int set_memory_nx(unsigned long addr, int numpages)
return __set_memory(addr, numpages, __pgprot(0), __pgprot(_PAGE_EXEC));
}
-int set_direct_map_invalid_noflush(struct page *page)
+int set_direct_map_invalid_noflush(const void *addr)
{
- return __set_memory((unsigned long)page_address(page), 1,
- __pgprot(0), __pgprot(_PAGE_PRESENT));
+ return __set_memory((unsigned long)addr, 1, __pgprot(0),
+ __pgprot(_PAGE_PRESENT));
}
-int set_direct_map_default_noflush(struct page *page)
+int set_direct_map_default_noflush(const void *addr)
{
- return __set_memory((unsigned long)page_address(page), 1,
- PAGE_KERNEL, __pgprot(_PAGE_EXEC));
+ return __set_memory((unsigned long)addr, 1, PAGE_KERNEL,
+ __pgprot(_PAGE_EXEC));
}
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid)
{
pgprot_t set, clear;
@@ -398,7 +399,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
clear = __pgprot(_PAGE_PRESENT);
}
- return __set_memory((unsigned long)page_address(page), nr, set, clear);
+ return __set_memory((unsigned long)addr, numpages, set, clear);
}
#ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/arch/s390/include/asm/set_memory.h b/arch/s390/include/asm/set_memory.h
index 94092f4ae764..3e43c3c96e67 100644
--- a/arch/s390/include/asm/set_memory.h
+++ b/arch/s390/include/asm/set_memory.h
@@ -60,9 +60,10 @@ __SET_MEMORY_FUNC(set_memory_rox, SET_MEMORY_RO | SET_MEMORY_X)
__SET_MEMORY_FUNC(set_memory_rwnx, SET_MEMORY_RW | SET_MEMORY_NX)
__SET_MEMORY_FUNC(set_memory_4k, SET_MEMORY_4K)
-int set_direct_map_invalid_noflush(struct page *page);
-int set_direct_map_default_noflush(struct page *page);
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
+int set_direct_map_invalid_noflush(const void *addr);
+int set_direct_map_default_noflush(const void *addr);
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid);
bool kernel_page_present(struct page *page);
#endif
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index bb29c38ae624..8e90ff5cf50d 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -383,17 +383,18 @@ int __set_memory(unsigned long addr, unsigned long numpages, unsigned long flags
return rc;
}
-int set_direct_map_invalid_noflush(struct page *page)
+int set_direct_map_invalid_noflush(const void *addr)
{
- return __set_memory((unsigned long)page_to_virt(page), 1, SET_MEMORY_INV);
+ return __set_memory((unsigned long)addr, 1, SET_MEMORY_INV);
}
-int set_direct_map_default_noflush(struct page *page)
+int set_direct_map_default_noflush(const void *addr)
{
- return __set_memory((unsigned long)page_to_virt(page), 1, SET_MEMORY_DEF);
+ return __set_memory((unsigned long)addr, 1, SET_MEMORY_DEF);
}
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid)
{
unsigned long flags;
@@ -402,7 +403,7 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
else
flags = SET_MEMORY_INV;
- return __set_memory((unsigned long)page_to_virt(page), nr, flags);
+ return __set_memory((unsigned long)addr, numpages, flags);
}
bool kernel_page_present(struct page *page)
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 4362c26aa992..b6a4173ff249 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -86,9 +86,10 @@ int set_pages_wb(struct page *page, int numpages);
int set_pages_ro(struct page *page, int numpages);
int set_pages_rw(struct page *page, int numpages);
-int set_direct_map_invalid_noflush(struct page *page);
-int set_direct_map_default_noflush(struct page *page);
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid);
+int set_direct_map_invalid_noflush(const void *addr);
+int set_direct_map_default_noflush(const void *addr);
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid);
bool kernel_page_present(struct page *page);
extern int kernel_set_to_readonly;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 40581a720fe8..7517195b75b9 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2587,9 +2587,9 @@ int set_pages_rw(struct page *page, int numpages)
return set_memory_rw(addr, numpages);
}
-static int __set_pages_p(struct page *page, int numpages)
+static int __set_pages_p(const void *addr, int numpages)
{
- unsigned long tempaddr = (unsigned long) page_address(page);
+ unsigned long tempaddr = (unsigned long)addr;
struct cpa_data cpa = { .vaddr = &tempaddr,
.pgd = NULL,
.numpages = numpages,
@@ -2606,9 +2606,9 @@ static int __set_pages_p(struct page *page, int numpages)
return __change_page_attr_set_clr(&cpa, 1);
}
-static int __set_pages_np(struct page *page, int numpages)
+static int __set_pages_np(const void *addr, int numpages)
{
- unsigned long tempaddr = (unsigned long) page_address(page);
+ unsigned long tempaddr = (unsigned long)addr;
struct cpa_data cpa = { .vaddr = &tempaddr,
.pgd = NULL,
.numpages = numpages,
@@ -2625,22 +2625,23 @@ static int __set_pages_np(struct page *page, int numpages)
return __change_page_attr_set_clr(&cpa, 1);
}
-int set_direct_map_invalid_noflush(struct page *page)
+int set_direct_map_invalid_noflush(const void *addr)
{
- return __set_pages_np(page, 1);
+ return __set_pages_np(addr, 1);
}
-int set_direct_map_default_noflush(struct page *page)
+int set_direct_map_default_noflush(const void *addr)
{
- return __set_pages_p(page, 1);
+ return __set_pages_p(addr, 1);
}
-int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
+int set_direct_map_valid_noflush(const void *addr, unsigned long numpages,
+ bool valid)
{
if (valid)
- return __set_pages_p(page, nr);
+ return __set_pages_p(addr, numpages);
- return __set_pages_np(page, nr);
+ return __set_pages_np(addr, numpages);
}
#ifdef CONFIG_DEBUG_PAGEALLOC
@@ -2659,9 +2660,9 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
* and hence no memory allocations during large page split.
*/
if (enable)
- __set_pages_p(page, numpages);
+ __set_pages_p(page_address(page), numpages);
else
- __set_pages_np(page, numpages);
+ __set_pages_np(page_address(page), numpages);
/*
* We should perform an IPI and flush all tlbs,
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 3030d9245f5a..1a2563f525fc 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -25,17 +25,18 @@ static inline int set_memory_rox(unsigned long addr, int numpages)
#endif
#ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP
-static inline int set_direct_map_invalid_noflush(struct page *page)
+static inline int set_direct_map_invalid_noflush(const void *addr)
{
return 0;
}
-static inline int set_direct_map_default_noflush(struct page *page)
+static inline int set_direct_map_default_noflush(const void *addr)
{
return 0;
}
-static inline int set_direct_map_valid_noflush(struct page *page,
- unsigned nr, bool valid)
+static inline int set_direct_map_valid_noflush(const void *addr,
+ unsigned long numpages,
+ bool valid)
{
return 0;
}
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 6e1321837c66..6eddfb22c0ff 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -88,7 +88,7 @@ static inline int hibernate_restore_unprotect_page(void *page_address) {return 0
static inline void hibernate_map_page(struct page *page)
{
if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
- int ret = set_direct_map_default_noflush(page);
+ int ret = set_direct_map_default_noflush(page_address(page));
if (ret)
pr_warn_once("Failed to remap page\n");
@@ -101,7 +101,7 @@ static inline void hibernate_unmap_page(struct page *page)
{
if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
unsigned long addr = (unsigned long)page_address(page);
- int ret = set_direct_map_invalid_noflush(page);
+ int ret = set_direct_map_invalid_noflush(page_address(page));
if (ret)
pr_warn_once("Failed to remap page\n");
diff --git a/mm/execmem.c b/mm/execmem.c
index 810a4ba9c924..220298ec87c8 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -119,7 +119,8 @@ static int execmem_set_direct_map_valid(struct vm_struct *vm, bool valid)
int err = 0;
for (int i = 0; i < vm->nr_pages; i += nr) {
- err = set_direct_map_valid_noflush(vm->pages[i], nr, valid);
+ err = set_direct_map_valid_noflush(page_address(vm->pages[i]),
+ nr, valid);
if (err)
goto err_restore;
updated += nr;
@@ -129,7 +130,8 @@ static int execmem_set_direct_map_valid(struct vm_struct *vm, bool valid)
err_restore:
for (int i = 0; i < updated; i += nr)
- set_direct_map_valid_noflush(vm->pages[i], nr, !valid);
+ set_direct_map_valid_noflush(page_address(vm->pages[i]), nr,
+ !valid);
return err;
}
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 11a779c812a7..fd29b33c6764 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -72,7 +72,7 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
goto out;
}
- err = set_direct_map_invalid_noflush(folio_page(folio, 0));
+ err = set_direct_map_invalid_noflush(folio_address(folio));
if (err) {
folio_put(folio);
ret = vmf_error(err);
@@ -87,7 +87,7 @@ static vm_fault_t secretmem_fault(struct vm_fault *vmf)
* already happened when we marked the page invalid
* which guarantees that this call won't fail
*/
- set_direct_map_default_noflush(folio_page(folio, 0));
+ set_direct_map_default_noflush(folio_address(folio));
folio_put(folio);
if (err == -EEXIST)
goto retry;
@@ -151,7 +151,7 @@ static int secretmem_migrate_folio(struct address_space *mapping,
static void secretmem_free_folio(struct folio *folio)
{
- set_direct_map_default_noflush(folio_page(folio, 0));
+ set_direct_map_default_noflush(folio_address(folio));
folio_zero_segment(folio, 0, folio_size(folio));
}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 61caa55a4402..8822f73957d9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3342,14 +3342,17 @@ struct vm_struct *remove_vm_area(const void *addr)
}
static inline void set_area_direct_map(const struct vm_struct *area,
- int (*set_direct_map)(struct page *page))
+ int (*set_direct_map)(const void *addr))
{
int i;
/* HUGE_VMALLOC passes small pages to set_direct_map */
- for (i = 0; i < area->nr_pages; i++)
- if (page_address(area->pages[i]))
- set_direct_map(area->pages[i]);
+ for (i = 0; i < area->nr_pages; i++) {
+ const void *addr = page_address(area->pages[i]);
+
+ if (addr)
+ set_direct_map(addr);
+ }
}
/*
--
2.50.1
^ permalink raw reply related
* [PATCH v12 00/16] Direct Map Removal Support for guest_memfd
From: Kalyazin, Nikita @ 2026-04-10 15:17 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
kernel@xen0n.name, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, loongarch@lists.linux.dev,
linux-pm@vger.kernel.org
Cc: pbonzini@redhat.com, corbet@lwn.net, maz@kernel.org,
oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
seanjc@google.com, tglx@kernel.org, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
lorenzo.stoakes@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev,
eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, jgg@ziepe.ca,
jhubbard@nvidia.com, peterx@redhat.com, jannh@google.com,
pfalcato@suse.de, skhan@linuxfoundation.org, riel@surriel.com,
ryan.roberts@arm.com, jgross@suse.com, yu-cheng.yu@intel.com,
kas@kernel.org, coxu@redhat.com, ackerleytng@google.com,
yosry@kernel.org, ajones@ventanamicro.com, maobibo@loongson.cn,
tabba@google.com, prsampat@amd.com, wu.fei9@sanechips.com.cn,
mlevitsk@redhat.com, jmattson@google.com, jthoughton@google.com,
agordeev@linux.ibm.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
borntraeger@linux.ibm.com, chenhuacai@kernel.org,
baolu.lu@linux.intel.com, dev.jain@arm.com, gor@linux.ibm.com,
hca@linux.ibm.com, palmer@dabbelt.com, pjw@kernel.org,
shijie@os.amperecomputing.com, svens@linux.ibm.com,
thuth@redhat.com, yang@os.amperecomputing.com,
Liam.Howlett@oracle.com, urezki@gmail.com,
zhengqi.arch@bytedance.com, gerald.schaefer@linux.ibm.com,
jiayuan.chen@shopee.com, lenb@kernel.org, pavel@kernel.org,
rafael@kernel.org, yangyicong@hisilicon.com,
vannapurve@google.com, jackmanb@google.com, patrick.roy@linux.dev,
Thomson, Jack, Itazuri, Takahiro, Manwaring, Derek,
Kalyazin, Nikita, Nikita Kalyazin
From: Nikita Kalyazin <nikita.kalyazin@linux.dev>
[ based on kvm/next ]
Unmapping virtual machine guest memory from the host kernel's direct map
is a successful mitigation against Spectre-style transient execution
issues: if the kernel page tables do not contain entries pointing to
guest memory, then any attempted speculative read through the direct map
will necessarily be blocked by the MMU before any observable
microarchitectural side-effects happen. This means that Spectre-gadgets
and similar cannot be used to target virtual machine memory. Roughly
60% of speculative execution issues fall into this category [1, Table
1].
This patch series extends guest_memfd with the ability to remove its
memory from the host kernel's direct map, to be able to attain the above
protection for KVM guests running inside guest_memfd.
Additionally, a Firecracker branch with support for these VMs can be
found on GitHub [2].
For more details, please refer to the v5 cover letter. No substantial
changes in design have taken place since.
See also related write() syscall support in guest_memfd [3] where
the interoperation between the two features is described.
Changes since v11:
- Ackerley/Sashiko: fix previously missed __set_pages_* argument update
in __kernel_map_pages (patch 1)
- David: disallow large folios in folio_zap_direct_map (patch 2)
- David/Sashiko: check for folio_is_zone_device if mapping is NULL in
gup_fast_folio_allowed (patch 4)
- Ackerley/Sashiko: kvm_arch_gmem_supports_no_direct_map to return
false for SEV-SNP (patch 8).
- David: replace a redundant check for GUEST_MEMFD_FLAG_NO_DIRECT_MAP
with a WARN_ON_ONCE (patch 10)
- David: assert the folio is locked when zapping direct map (patch 10)
- Ackerley/Sashiko: reorder operations to "zap then prepare" and
"invalidate then restore" (patch 10)
v11: https://lore.kernel.org/kvm/20260317141031.514-1-kalyazin@amazon.com
v10: https://lore.kernel.org/kvm/20260126164445.11867-1-kalyazin@amazon.com
v9: https://lore.kernel.org/kvm/20260114134510.1835-1-kalyazin@amazon.com
v8: https://lore.kernel.org/kvm/20251205165743.9341-1-kalyazin@amazon.com
v7: https://lore.kernel.org/kvm/20250924151101.2225820-1-patrick.roy@campus.lmu.de
v6: https://lore.kernel.org/kvm/20250912091708.17502-1-roypat@amazon.co.uk
v5: https://lore.kernel.org/kvm/20250828093902.2719-1-roypat@amazon.co.uk
v4: https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk
RFCv3: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk
RFCv2: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk
RFCv1: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk
[1] https://download.vusec.net/papers/quarantine_raid23.pdf
[2] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[3] https://lore.kernel.org/kvm/20251114151828.98165-1-kalyazin@amazon.com
Nikita Kalyazin (4):
set_memory: set_direct_map_* to take address
set_memory: add folio_{zap,restore}_direct_map helpers
mm/secretmem: make use of folio_{zap,restore}_direct_map
mm/gup: drop local variable in gup_fast_folio_allowed
Patrick Roy (12):
mm/gup: drop secretmem optimization from gup_fast_folio_allowed
mm: introduce AS_NO_DIRECT_MAP
KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate
KVM: x86: define kvm_arch_gmem_supports_no_direct_map()
KVM: arm64: define kvm_arch_gmem_supports_no_direct_map()
KVM: guest_memfd: Add flag to remove from direct map
KVM: selftests: load elf via bounce buffer
KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd
!= -1
KVM: selftests: Add guest_memfd based vm_mem_backing_src_types
KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing
selftests
KVM: selftests: stuff vm_mem_backing_src_type into vm_shape
KVM: selftests: Test guest execution from direct map removed gmem
Documentation/virt/kvm/api.rst | 21 +++---
arch/arm64/include/asm/kvm_host.h | 13 ++++
arch/arm64/include/asm/set_memory.h | 7 +-
arch/arm64/mm/pageattr.c | 19 +++--
arch/loongarch/include/asm/set_memory.h | 7 +-
arch/loongarch/mm/pageattr.c | 25 +++----
arch/riscv/include/asm/set_memory.h | 7 +-
arch/riscv/mm/pageattr.c | 17 +++--
arch/s390/include/asm/set_memory.h | 7 +-
arch/s390/mm/pageattr.c | 13 ++--
arch/x86/include/asm/kvm_host.h | 6 ++
arch/x86/include/asm/set_memory.h | 7 +-
arch/x86/kvm/x86.c | 7 ++
arch/x86/mm/pat/set_memory.c | 27 +++----
include/linux/kvm_host.h | 14 ++++
include/linux/pagemap.h | 16 ++++
include/linux/secretmem.h | 18 -----
include/linux/set_memory.h | 22 +++++-
include/uapi/linux/kvm.h | 1 +
kernel/power/snapshot.c | 4 +-
lib/buildid.c | 8 +-
mm/execmem.c | 6 +-
mm/gup.c | 47 ++++++------
mm/memory.c | 45 +++++++++++
mm/mlock.c | 2 +-
mm/secretmem.c | 18 ++---
mm/vmalloc.c | 11 ++-
.../testing/selftests/kvm/guest_memfd_test.c | 17 ++++-
.../testing/selftests/kvm/include/kvm_util.h | 37 ++++++---
.../testing/selftests/kvm/include/test_util.h | 8 ++
tools/testing/selftests/kvm/lib/elf.c | 8 +-
tools/testing/selftests/kvm/lib/io.c | 23 ++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 59 ++++++++-------
tools/testing/selftests/kvm/lib/test_util.c | 8 ++
tools/testing/selftests/kvm/lib/x86/sev.c | 1 +
.../selftests/kvm/pre_fault_memory_test.c | 1 +
.../selftests/kvm/set_memory_region_test.c | 52 ++++++++++++-
.../kvm/x86/private_mem_conversions_test.c | 7 +-
virt/kvm/guest_memfd.c | 75 +++++++++++++++++--
39 files changed, 489 insertions(+), 202 deletions(-)
base-commit: 24f9515de8778410e4b84c85b196c9850d2c1e18
--
2.50.1
^ permalink raw reply
* Re: [PATCH v2 7/7] platform/x86/intel/pmc: Add Nova Lake support to intel_pmc_core driver
From: Ilpo Järvinen @ 2026-04-10 14:38 UTC (permalink / raw)
To: Xi Pardee
Cc: irenic.rajneesh, david.e.box, platform-driver-x86, LKML, linux-pm
In-Reply-To: <20260408222144.3288928-8-xi.pardee@linux.intel.com>
On Wed, 8 Apr 2026, Xi Pardee wrote:
> Add Nova Lake support in intel_pmc_core driver
>
> Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
> ---
> drivers/platform/x86/intel/pmc/Makefile | 3 +-
> drivers/platform/x86/intel/pmc/core.c | 2 +
> drivers/platform/x86/intel/pmc/core.h | 31 +
> drivers/platform/x86/intel/pmc/nvl.c | 1539 +++++++++++++++++++++++
> drivers/platform/x86/intel/pmc/ptl.c | 2 +-
> 5 files changed, 1575 insertions(+), 2 deletions(-)
> create mode 100644 drivers/platform/x86/intel/pmc/nvl.c
>
> diff --git a/drivers/platform/x86/intel/pmc/Makefile b/drivers/platform/x86/intel/pmc/Makefile
> index bb960c8721d77..23853e867c912 100644
> --- a/drivers/platform/x86/intel/pmc/Makefile
> +++ b/drivers/platform/x86/intel/pmc/Makefile
> @@ -4,7 +4,8 @@
> #
>
> intel_pmc_core-y := core.o spt.o cnp.o icl.o \
> - tgl.o adl.o mtl.o arl.o lnl.o ptl.o wcl.o
> + tgl.o adl.o mtl.o arl.o \
> + lnl.o ptl.o wcl.o nvl.o
> obj-$(CONFIG_INTEL_PMC_CORE) += intel_pmc_core.o
> intel_pmc_core_pltdrv-y := pltdrv.o
> obj-$(CONFIG_INTEL_PMC_CORE) += intel_pmc_core_pltdrv.o
> diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
> index c84e75b19aac3..207708f4ceb94 100644
> --- a/drivers/platform/x86/intel/pmc/core.c
> +++ b/drivers/platform/x86/intel/pmc/core.c
> @@ -1849,6 +1849,8 @@ static const struct x86_cpu_id intel_pmc_core_ids[] = {
> X86_MATCH_VFM(INTEL_LUNARLAKE_M, &lnl_pmc_dev),
> X86_MATCH_VFM(INTEL_PANTHERLAKE_L, &ptl_pmc_dev),
> X86_MATCH_VFM(INTEL_WILDCATLAKE_L, &wcl_pmc_dev),
> + X86_MATCH_VFM(INTEL_NOVALAKE, &nvl_s_pmc_dev),
> + X86_MATCH_VFM(INTEL_NOVALAKE_L, &nvl_h_pmc_dev),
> {}
> };
>
> diff --git a/drivers/platform/x86/intel/pmc/core.h b/drivers/platform/x86/intel/pmc/core.h
> index a741e4698f195..f2b4a20d2ff44 100644
> --- a/drivers/platform/x86/intel/pmc/core.h
> +++ b/drivers/platform/x86/intel/pmc/core.h
> @@ -307,6 +307,29 @@ enum ppfear_regs {
> #define WCL_NUM_S0IX_BLOCKER 94
> #define WCL_BLK_REQ_OFFSET 50
>
> +/* Nova Lake */
> +#define NVL_PCDH_PPFEAR_NUM_ENTRIES 13
> +#define NVL_PCDH_PMC_MMIO_REG_LEN 0x363c
> +#define NVL_PCDS_PMC_MMIO_REG_LEN 0x3118
> +#define NVL_PCHS_PMC_MMIO_REG_LEN 0x30d8
> +#define NVL_LPM_PRI_OFFSET 0x17a4
> +#define NVL_LPM_EN_OFFSET 0x17a0
> +#define NVL_LPM_RESIDENCY_OFFSET 0x17a8
> +#define NVL_LPM_LIVE_STATUS_OFFSET 0x1760
> +#define NVL_LPM_NUM_MAPS 15
> +#define NVL_PCDH_NUM_S0IX_BLOCKER 107
> +#define NVL_PCDS_NUM_S0IX_BLOCKER 71
> +#define NVL_PCHS_NUM_S0IX_BLOCKER 54
> +#define NVL_PCDS_PMC_LTR_RESERVED 0x1bac
> +#define NVL_PCDH_BLK_REQ_OFFSET 53
> +#define NVL_PCDS_BLK_REQ_OFFSET 18
> +#define NVL_PCHS_BLK_REQ_OFFSET 46
> +#define NVL_PMT_PC_GUID 0x13000101
> +#define NVL_PMT_DMU_GUID 0x1a000101
> +#define NVL_LTR_BLK_OFFSET 64
> +#define NVL_PKGC_BLK_OFFSET 4
> +#define NVL_PMT_DMU_DIE_C6_OFFSET 25
> +
> /* SSRAM PMC Device ID */
> /* LNL */
> #define PMC_DEVID_LNL_SOCM 0xa87f
> @@ -329,6 +352,11 @@ enum ppfear_regs {
> #define PMC_DEVID_MTL_IOEP 0x7ecf
> #define PMC_DEVID_MTL_IOEM 0x7ebf
>
> +/* NVL */
> +#define PMC_DEVID_NVL_PCDH 0xd37e
> +#define PMC_DEVID_NVL_PCDS 0xd47e
> +#define PMC_DEVID_NVL_PCHS 0x6e27
> +
> extern const char *pmc_lpm_modes[];
>
> struct pmc_bit_map {
> @@ -558,6 +586,7 @@ extern const struct pmc_reg_map mtl_ioep_reg_map;
> extern const struct pmc_bit_map ptl_pcdp_clocksource_status_map[];
> extern const struct pmc_bit_map ptl_pcdp_vnn_req_status_3_map[];
> extern const struct pmc_bit_map ptl_pcdp_signal_status_map[];
> +extern const struct pmc_bit_map ptl_pcdp_ltr_show_map[];
>
> void pmc_core_get_tgl_lpm_reqs(struct platform_device *pdev);
> int pmc_core_send_ltr_ignore(struct pmc_dev *pmcdev, u32 value, int ignore);
> @@ -581,6 +610,8 @@ extern struct pmc_dev_info arl_h_pmc_dev;
> extern struct pmc_dev_info lnl_pmc_dev;
> extern struct pmc_dev_info ptl_pmc_dev;
> extern struct pmc_dev_info wcl_pmc_dev;
> +extern struct pmc_dev_info nvl_s_pmc_dev;
> +extern struct pmc_dev_info nvl_h_pmc_dev;
>
> void cnl_suspend(struct pmc_dev *pmcdev);
> int cnl_resume(struct pmc_dev *pmcdev);
> diff --git a/drivers/platform/x86/intel/pmc/nvl.c b/drivers/platform/x86/intel/pmc/nvl.c
> new file mode 100644
> index 0000000000000..96f4244d602be
> --- /dev/null
> +++ b/drivers/platform/x86/intel/pmc/nvl.c
> @@ -0,0 +1,1539 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This file contains platform specific structure definitions
> + * and init function used by Nova Lake PCH.
> + *
> + * Copyright (c) 2026, Intel Corporation.
> + */
> +
> +#include <linux/pci.h>
> +
> +#include "core.h"
> +
> +/* PMC SSRAM PMT Telemetry GUIDS */
> +#define PCDH_LPM_REQ_GUID 0x01093101
> +#define PCHS_LPM_REQ_GUID 0x01092101
> +#define PCDS_LPM_REQ_GUID 0x01091102
> +
> +/*
> + * Die Mapping to Product.
> + * Product PCDDie PCHDie
> + * NVL-H PCD-H None
> + * NVL-S PCD-S PCH-S
> + */
> +
> +static const struct pmc_bit_map nvl_pcdh_pfear_map[] = {
> + {"PMC_PGD0", BIT(0)},
Add #include for BIT().
> + {"FUSE_OSSE_PGD0", BIT(1)},
> + {"SPI_PGD0", BIT(2)},
> + {"XHCI_PGD0", BIT(3)},
> + {"SPA_PGD0", BIT(4)},
> + {"SPB_PGD0", BIT(5)},
> + {"MPFPW2_PGD0", BIT(6)},
> + {"GBE_PGD0", BIT(7)},
> +
> + {"SBR16B20_PGD0", BIT(0)},
> + {"DBG_SBR_PGD0", BIT(1)},
> + {"SBR16B7_PGD0", BIT(2)},
> + {"STRC_PGD0", BIT(3)},
> + {"SBR16B8_PGD0", BIT(4)},
> + {"D2D_DISP_PGD1", BIT(5)},
> + {"LPSS_PGD0", BIT(6)},
> + {"LPC_PGD0", BIT(7)},
> +
> + {"SMB_PGD0", BIT(0)},
> + {"ISH_PGD0", BIT(1)},
> + {"SBR16B2_PGD0", BIT(2)},
> + {"NPK_PGD0", BIT(3)},
> + {"D2D_NOC_PGD1", BIT(4)},
> + {"DBG_SBR16B_PGD0", BIT(5)},
> + {"FUSE_PGD0", BIT(6)},
> + {"SBR16B0_PGD0", BIT(7)},
> +
> + {"P2SB0_PGD0", BIT(0)},
> + {"OTG_PGD0", BIT(1)},
> + {"EXI_PGD0", BIT(2)},
> + {"CSE_PGD0", BIT(3)},
> + {"CSME_KVM_PGD0", BIT(4)},
> + {"CSME_PMT_PGD0", BIT(5)},
> + {"CSME_CLINK_PGD0", BIT(6)},
> + {"SBR16B21_PGD0", BIT(7)},
> +
> + {"CSME_USBR_PGD0", BIT(0)},
> + {"SBR16B22_PGD0", BIT(1)},
> + {"CSME_SMT1_PGD0", BIT(2)},
> + {"MPFPW1_PGD0", BIT(3)},
> + {"CSME_SMS2_PGD0", BIT(4)},
> + {"CSME_SMS_PGD0", BIT(5)},
> + {"CSME_RTC_PGD0", BIT(6)},
> + {"CSMEPSF_PGD0", BIT(7)},
> +
> + {"D2D_NOC_PGD0", BIT(0)},
> + {"ESE_PGD0", BIT(1)},
> + {"SBR16B6_PGD0", BIT(2)},
> + {"P2SB1_PGD0", BIT(3)},
> + {"SBR16B3_PGD0", BIT(4)},
> + {"OSSE_SMT1_PGD0", BIT(5)},
> + {"D2D_DISP_PGD0", BIT(6)},
> + {"SNPS_USB2_A_PGD0", BIT(7)},
> +
> + {"U3FPW1_PGD0", BIT(0)},
> + {"FIA_X_PGD0", BIT(1)},
> + {"PSF4_PGD0", BIT(2)},
> + {"CNVI_PGD0", BIT(3)},
> + {"UFSX2_PGD0", BIT(4)},
> + {"ENDBG_PGD0", BIT(5)},
> + {"DBC_PGD0", BIT(6)},
> + {"FIA_PG_PGD0", BIT(7)},
> +
> + {"D2D_IPU_PGD0", BIT(0)},
> + {"NPK_PGD1", BIT(1)},
> + {"FIACPCB_X_PGD0", BIT(2)},
> + {"SBR8B4_PGD0", BIT(3)},
> + {"DBG_PSF_PGD0", BIT(4)},
> + {"PSF6_PGD0", BIT(5)},
> + {"UFSPW1_PGD0", BIT(6)},
> + {"FIA_U_PGD0", BIT(7)},
> +
> + {"PSF8_PGD0", BIT(0)},
> + {"SBR16B9_PGD0", BIT(1)},
> + {"PSF0_PGD0", BIT(2)},
> + {"FIACPCB_U_PGD0", BIT(3)},
> + {"TAM_PGD0", BIT(4)},
> + {"D2D_NOC_PGD2", BIT(5)},
> + {"SBR8B2_PGD0", BIT(6)},
> + {"THC0_PGD0", BIT(7)},
> +
> + {"THC1_PGD0", BIT(0)},
> + {"PMC_PGD1", BIT(1)},
> + {"DISP_PGA1_PGD0", BIT(2)},
> + {"TCSS_PGD0", BIT(3)},
> + {"DISP_PGA_PGD0", BIT(4)},
> + {"SBR16B1_PGD0", BIT(5)},
> + {"SBRG_PGD0", BIT(6)},
> + {"PSF5_PGD0", BIT(7)},
> +
> + {"SBR8B3_PGD0", BIT(0)},
> + {"ACE_PGD0", BIT(1)},
> + {"ACE_PGD1", BIT(2)},
> + {"ACE_PGD2", BIT(3)},
> + {"ACE_PGD3", BIT(4)},
> + {"ACE_PGD4", BIT(5)},
> + {"ACE_PGD5", BIT(6)},
> + {"ACE_PGD6", BIT(7)},
> +
> + {"ACE_PGD7", BIT(0)},
> + {"ACE_PGD8", BIT(1)},
> + {"ACE_PGD9", BIT(2)},
> + {"ACE_PGD10", BIT(3)},
> + {"FIACPCB_PG_PGD0", BIT(4)},
> + {"SNPS_USB2_B_PGD0", BIT(5)},
> + {"OSSE_PGD0", BIT(6)},
> + {"SBR8B0_PGD0", BIT(7)},
> +
> + {"SBR16B4_PGD0", BIT(0)},
> + {"CSME_PTIO_PGD0", BIT(1)},
> + {}
> +};
> +
> +static const struct pmc_bit_map *ext_nvl_pcdh_pfear_map[] = {
> + nvl_pcdh_pfear_map,
> + NULL
> +};
> +
> +const struct pmc_bit_map nvl_pcdh_clocksource_status_map[] = {
> + {"AON2_OFF_STS", BIT(0), 1},
> + {"AON3_OFF_STS", BIT(1), 0},
> + {"AON4_OFF_STS", BIT(2), 1},
> + {"AON5_OFF_STS", BIT(3), 1},
> + {"AON1_OFF_STS", BIT(4), 0},
> + {"XTAL_LVM_OFF_STS", BIT(5), 0},
> + {"MPFPW1_0_PLL_OFF_STS", BIT(6), 1},
> + {"D2D_PLL_OFF_STS", BIT(7), 1},
> + {"USB3_PLL_OFF_STS", BIT(8), 1},
> + {"AON3_SPL_OFF_STS", BIT(9), 1},
> + {"MPFPW2_0_PLL_OFF_STS", BIT(12), 1},
> + {"XTAL_AGGR_OFF_STS", BIT(17), 1},
> + {"USB2_PLL_OFF_STS", BIT(18), 0},
> + {"DDI2_PLL_OFF_STS", BIT(19), 1},
> + {"SE_TCSS_PLL_OFF_STS", BIT(20), 1},
> + {"DDI_PLL_OFF_STS", BIT(21), 1},
> + {"FILTER_PLL_OFF_STS", BIT(22), 1},
> + {"ACE_PLL_OFF_STS", BIT(24), 0},
> + {"FABRIC_PLL_OFF_STS", BIT(25), 1},
> + {"SOC_PLL_OFF_STS", BIT(26), 1},
> + {"REF_PLL_OFF_STS", BIT(28), 1},
> + {"IMG_PLL_OFF_STS", BIT(29), 1},
> + {"GENLOCK_FILTER_PLL_OFF_STS", BIT(30), 1},
> + {"RTC_PLL_OFF_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_power_gating_status_0_map[] = {
> + {"PMC_PGD0_PG_STS", BIT(0), 0},
> + {"FUSE_OSSE_PGD0_PG_STS", BIT(1), 0},
> + {"ESPISPI_PGD0_PG_STS", BIT(2), 0},
> + {"XHCI_PGD0_PG_STS", BIT(3), 1},
> + {"SPA_PGD0_PG_STS", BIT(4), 1},
> + {"SPB_PGD0_PG_STS", BIT(5), 1},
> + {"MPFPW2_PGD0_PG_STS", BIT(6), 0},
> + {"GBE_PGD0_PG_STS", BIT(7), 1},
> + {"SBR16B20_PGD0_PG_STS", BIT(8), 0},
> + {"DBG_PGD0_PG_STS", BIT(9), 0},
> + {"SBR16B7_PGD0_PG_STS", BIT(10), 0},
> + {"STRC_PGD0_PG_STS", BIT(11), 0},
> + {"SBR16B8_PGD0_PG_STS", BIT(12), 0},
> + {"D2D_DISP_PGD1_PG_STS", BIT(13), 1},
> + {"LPSS_PGD0_PG_STS", BIT(14), 1},
> + {"LPC_PGD0_PG_STS", BIT(15), 0},
> + {"SMB_PGD0_PG_STS", BIT(16), 0},
> + {"ISH_PGD0_PG_STS", BIT(17), 0},
> + {"SBR16B2_PGD0_PG_STS", BIT(18), 0},
> + {"NPK_PGD0_PG_STS", BIT(19), 0},
> + {"D2D_NOC_PGD1_PG_STS", BIT(20), 1},
> + {"DBG_SBR16B_PGD0_PG_STS", BIT(21), 0},
> + {"FUSE_PGD0_PG_STS", BIT(22), 0},
> + {"SBR16B0_PGD0_PG_STS", BIT(23), 0},
> + {"P2SB0_PGD0_PG_STS", BIT(24), 1},
> + {"XDCI_PGD0_PG_STS", BIT(25), 1},
> + {"EXI_PGD0_PG_STS", BIT(26), 0},
> + {"CSE_PGD0_PG_STS", BIT(27), 1},
> + {"KVMCC_PGD0_PG_STS", BIT(28), 1},
> + {"PMT_PGD0_PG_STS", BIT(29), 1},
> + {"CLINK_PGD0_PG_STS", BIT(30), 1},
> + {"SBR16B21_PGD0_PG_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_power_gating_status_1_map[] = {
> + {"USBR0_PGD0_PG_STS", BIT(0), 1},
> + {"SBR16B22_PGD0_PG_STS", BIT(1), 0},
> + {"SMT1_PGD0_PG_STS", BIT(2), 1},
> + {"MPFPW1_PGD0_PG_STS", BIT(3), 0},
> + {"SMS2_PGD0_PG_STS", BIT(4), 1},
> + {"SMS1_PGD0_PG_STS", BIT(5), 1},
> + {"CSMERTC_PGD0_PG_STS", BIT(6), 0},
> + {"CSMEPSF_PGD0_PG_STS", BIT(7), 0},
> + {"D2D_NOC_PGD0_PG_STS", BIT(8), 0},
> + {"ESE_PGD0_PG_STS", BIT(9), 1},
> + {"SBR16B6_PGD0_PG_STS", BIT(10), 0},
> + {"P2SB1_PGD0_PG_STS", BIT(11), 1},
> + {"SBR16B3_PGD0_PG_STS", BIT(12), 0},
> + {"OSSE_SMT1_PGD0_PG_STS", BIT(13), 1},
> + {"D2D_DISP_PGD0_PG_STS", BIT(14), 1},
> + {"SNPA_USB2_A_PGD0_PG_STS", BIT(15), 0},
> + {"U3FPW1_PGD0_PG_STS", BIT(16), 0},
> + {"FIA_X_PGD0_PG_STS", BIT(17), 0},
> + {"PSF4_PGD0_PG_STS", BIT(18), 0},
> + {"CNVI_PGD0_PG_STS", BIT(19), 0},
> + {"UFSX2_PGD0_PG_STS", BIT(20), 1},
> + {"ENDBG_PGD0_PG_STS", BIT(21), 0},
> + {"DBC_PGD0_PG_STS", BIT(22), 0},
> + {"FIA_PG_PGD0_PG_STS", BIT(23), 0},
> + {"D2D_IPU_PGD0_PG_STS", BIT(24), 1},
> + {"NPK_PGD1_PG_STS", BIT(25), 0},
> + {"FIACPCB_X_PGD0_PG_STS", BIT(26), 0},
> + {"SBR8B4_PGD0_PG_STS", BIT(27), 0},
> + {"DBG_PSF_PGD0_PG_STS", BIT(28), 0},
> + {"PSF6_PGD0_PG_STS", BIT(29), 0},
> + {"UFSPW1_PGD0_PG_STS", BIT(30), 0},
> + {"FIA_U_PGD0_PG_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_power_gating_status_2_map[] = {
> + {"PSF8_PGD0_PG_STS", BIT(0), 0},
> + {"SBR16B9_PGD0_PG_STS", BIT(1), 0},
> + {"PSF0_PGD0_PG_STS", BIT(2), 0},
> + {"FIACPCB_U_PGD0_PG_STS", BIT(3), 0},
> + {"TAM_PGD0_PG_STS", BIT(4), 1},
> + {"D2D_NOC_PGD2_PG_STS", BIT(5), 1},
> + {"SBR8B2_PGD0_PG_STS", BIT(6), 0},
> + {"THC0_PGD0_PG_STS", BIT(7), 1},
> + {"THC1_PGD0_PG_STS", BIT(8), 1},
> + {"PMC_PGD1_PG_STS", BIT(9), 0},
> + {"DISP_PGA1_PGD0_PG_STS", BIT(10), 0},
> + {"TCSS_PGD0_PG_STS", BIT(11), 0},
> + {"DISP_PGA_PGD0_PG_STS", BIT(12), 0},
> + {"SBR16B1_PGD0_PG_STS", BIT(13), 0},
> + {"SBRG_PGD0_PG_STS", BIT(14), 0},
> + {"PSF5_PGD0_PG_STS", BIT(15), 0},
> + {"SBR8B3_PGD0_PG_STS", BIT(16), 0},
> + {"ACE_PGD0_PG_STS", BIT(17), 0},
> + {"ACE_PGD1_PG_STS", BIT(18), 0},
> + {"ACE_PGD2_PG_STS", BIT(19), 0},
> + {"ACE_PGD3_PG_STS", BIT(20), 0},
> + {"ACE_PGD4_PG_STS", BIT(21), 0},
> + {"ACE_PGD5_PG_STS", BIT(22), 0},
> + {"ACE_PGD6_PG_STS", BIT(23), 0},
> + {"ACE_PGD7_PG_STS", BIT(24), 0},
> + {"ACE_PGD8_PG_STS", BIT(25), 0},
> + {"ACE_PGD9_PG_STS", BIT(26), 0},
> + {"ACE_PGD10_PG_STS", BIT(27), 0},
> + {"FIACPCB_PG_PGD0_PG_STS", BIT(28), 0},
> + {"SNPS_USB2_B_PGD0_PG_STS", BIT(29), 0},
> + {"OSSE_PGD0_PG_STS", BIT(30), 1},
> + {"SBR8B0_PGD0_PG_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_power_gating_status_3_map[] = {
> + {"SBR16B4_PGD0_PG_STS", BIT(0), 0},
> + {"PTIO_PGD0_PG_STS", BIT(1), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_d3_status_0_map[] = {
> + {"LPSS_D3_STS", BIT(3), 1},
> + {"XDCI_D3_STS", BIT(4), 1},
> + {"XHCI_D3_STS", BIT(5), 1},
> + {"OSSE_D3_STS", BIT(6), 0},
> + {"SPA_D3_STS", BIT(12), 0},
> + {"SPB_D3_STS", BIT(13), 0},
> + {"ESPISPI_D3_STS", BIT(18), 0},
> + {"PSTH_D3_STS", BIT(21), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_d3_status_1_map[] = {
> + {"OSSE_SMT1_D3_STS", BIT(0), 0},
> + {"GBE_D3_STS", BIT(19), 0},
> + {"ITSS_D3_STS", BIT(23), 0},
> + {"CNVI_D3_STS", BIT(27), 0},
> + {"UFSX2_D3_STS", BIT(28), 0},
> + {"ESE_D3_STS", BIT(29), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_d3_status_2_map[] = {
> + {"CSMERTC_D3_STS", BIT(1), 0},
> + {"CSE_D3_STS", BIT(4), 0},
> + {"KVMCC_D3_STS", BIT(5), 0},
> + {"USBR0_D3_STS", BIT(6), 0},
> + {"ISH_D3_STS", BIT(7), 0},
> + {"SMT1_D3_STS", BIT(8), 0},
> + {"SMT2_D3_STS", BIT(9), 0},
> + {"SMT3_D3_STS", BIT(10), 0},
> + {"OSSE_SMT2_D3_STS", BIT(11), 0},
> + {"CLINK_D3_STS", BIT(14), 0},
> + {"PTIO_D3_STS", BIT(16), 0},
> + {"PMT_D3_STS", BIT(17), 0},
> + {"SMS1_D3_STS", BIT(18), 0},
> + {"SMS2_D3_STS", BIT(19), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_d3_status_3_map[] = {
> + {"THC0_D3_STS", BIT(14), 1},
> + {"THC1_D3_STS", BIT(15), 1},
> + {"OSSE_SMT3_D3_STS", BIT(16), 0},
> + {"ACE_D3_STS", BIT(23), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_vnn_req_status_0_map[] = {
> + {"LPSS_VNN_REQ_STS", BIT(3), 1},
> + {"OSSE_VNN_REQ_STS", BIT(6), 1},
> + {"ESPISPI_VNN_REQ_STS", BIT(18), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_vnn_req_status_1_map[] = {
> + {"OSSE_SMT1_VNN_REQ_STS", BIT(0), 1},
> + {"NPK_VNN_REQ_STS", BIT(4), 1},
> + {"DFXAGG_VNN_REQ_STS", BIT(8), 0},
> + {"EXI_VNN_REQ_STS", BIT(9), 1},
> + {"P2D_VNN_REQ_STS", BIT(18), 1},
> + {"GBE_VNN_REQ_STS", BIT(19), 1},
> + {"SMB_VNN_REQ_STS", BIT(25), 1},
> + {"LPC_VNN_REQ_STS", BIT(26), 0},
> + {"ESE_VNN_REQ_STS", BIT(29), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_vnn_req_status_2_map[] = {
> + {"CSMERTC_VNN_REQ_STS", BIT(1), 1},
> + {"CSE_VNN_REQ_STS", BIT(4), 1},
> + {"ISH_VNN_REQ_STS", BIT(7), 1},
> + {"SMT1_VNN_REQ_STS", BIT(8), 1},
> + {"CLINK_VNN_REQ_STS", BIT(14), 1},
> + {"SMS1_VNN_REQ_STS", BIT(18), 1},
> + {"SMS2_VNN_REQ_STS", BIT(19), 1},
> + {"GPIOCOM4_VNN_REQ_STS", BIT(20), 1},
> + {"GPIOCOM3_VNN_REQ_STS", BIT(21), 1},
> + {"DISP_SHIM_VNN_REQ_STS", BIT(22), 1},
> + {"GPIOCOM1_VNN_REQ_STS", BIT(23), 1},
> + {"GPIOCOM0_VNN_REQ_STS", BIT(24), 1},
> + {}
> +};
> +
> +const struct pmc_bit_map nvl_pcdh_vnn_req_status_3_map[] = {
> + {"DTS0_VNN_REQ_STS", BIT(7), 0},
> + {"GPIOCOM5_VNN_REQ_STS", BIT(11), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_vnn_misc_status_map[] = {
> + {"CPU_C10_REQ_STS", BIT(0), 0},
> + {"TS_OFF_REQ_STS", BIT(1), 0},
> + {"PNDE_MET_REQ_STS", BIT(2), 1},
> + {"PG5_PMA0_REQ_STS", BIT(3), 1},
> + {"FW_THROTTLE_ALLOWED_REQ_STS", BIT(4), 0},
> + {"VNN_SOC_REQ_STS", BIT(6), 1},
> + {"ISH_VNNAON_REQ_STS", BIT(7), 0},
> + {"D2D_NOC_CFI_QACTIVE_REQ_STS", BIT(8), 1},
> + {"D2D_NOC_GPSB_QACTIVE_REQ_STS", BIT(9), 1},
> + {"D2D_IPU_QACTIVE_REQ_STS", BIT(10), 1},
> + {"PLT_GREATER_REQ_STS", BIT(11), 1},
> + {"ALL_SBR_IDLE_REQ_STS", BIT(12), 0},
> + {"PMC_IDLE_FB_OCP_REQ_STS", BIT(13), 0},
> + {"PM_SYNC_STATES_REQ_STS", BIT(14), 0},
> + {"EA_REQ_STS", BIT(15), 0},
> + {"MPHY_CORE_OFF_REQ_STS", BIT(16), 0},
> + {"BRK_EV_EN_REQ_STS", BIT(17), 0},
> + {"AUTO_DEMO_EN_REQ_STS", BIT(18), 0},
> + {"ITSS_CLK_SRC_REQ_STS", BIT(19), 1},
> + {"ARC_IDLE_REQ_STS", BIT(21), 0},
> + {"PG5_PMA1_REQ_STS", BIT(22), 1},
> + {"FIA_DEEP_PM_REQ_STS", BIT(23), 0},
> + {"XDCI_ATTACHED_REQ_STS", BIT(24), 1},
> + {"ARC_INTERRUPT_WAKE_REQ_STS", BIT(25), 0},
> + {"D2D_DISP_DDI_QACTIVE_REQ_STS", BIT(26), 1},
> + {"PRE_WAKE0_REQ_STS", BIT(27), 1},
> + {"PRE_WAKE1_REQ_STS", BIT(28), 1},
> + {"PRE_WAKE2_REQ_STS", BIT(29), 1},
> + {"PG5_PMA2_GVNN", BIT(30), 1},
> + {"D2D_DISP_EDP_QACTIVE_REQ_STS", BIT(31), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcdh_rsc_status_map[] = {
> + {"CORE", 0, 1},
> + {"Memory", 0, 1},
> + {"PRIM_D2D", 0, 1},
> + {"PSF0", 0, 1},
> + {"PSF4", 0, 1},
> + {"PSF6", 0, 1},
> + {"PSF8", 0, 1},
> + {"SB", 0, 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map *nvl_pcdh_lpm_maps[] = {
> + nvl_pcdh_clocksource_status_map,
> + nvl_pcdh_power_gating_status_0_map,
> + nvl_pcdh_power_gating_status_1_map,
> + nvl_pcdh_power_gating_status_2_map,
> + nvl_pcdh_power_gating_status_3_map,
> + nvl_pcdh_d3_status_0_map,
> + nvl_pcdh_d3_status_1_map,
> + nvl_pcdh_d3_status_2_map,
> + nvl_pcdh_d3_status_3_map,
> + nvl_pcdh_vnn_req_status_0_map,
> + nvl_pcdh_vnn_req_status_1_map,
> + nvl_pcdh_vnn_req_status_2_map,
> + nvl_pcdh_vnn_req_status_3_map,
> + nvl_pcdh_vnn_misc_status_map,
> + ptl_pcdp_signal_status_map,
> + NULL
> +};
> +
> +static const struct pmc_bit_map *nvl_pcdh_blk_maps[] = {
> + nvl_pcdh_power_gating_status_0_map,
> + nvl_pcdh_power_gating_status_1_map,
> + nvl_pcdh_power_gating_status_2_map,
> + nvl_pcdh_power_gating_status_3_map,
> + nvl_pcdh_rsc_status_map,
> + nvl_pcdh_vnn_req_status_0_map,
> + nvl_pcdh_vnn_req_status_1_map,
> + nvl_pcdh_vnn_req_status_2_map,
> + nvl_pcdh_vnn_req_status_3_map,
> + nvl_pcdh_d3_status_0_map,
> + nvl_pcdh_d3_status_1_map,
> + nvl_pcdh_d3_status_2_map,
> + nvl_pcdh_d3_status_3_map,
> + nvl_pcdh_clocksource_status_map,
> + nvl_pcdh_vnn_misc_status_map,
> + ptl_pcdp_signal_status_map,
> + NULL
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_pfear_map[] = {
> + {"PMC_PGD0", BIT(0)},
> + {"FUSE_OSSE_PGD0", BIT(1)},
> + {"SPI_PGD0", BIT(2)},
> + {"XHCI_PGD0", BIT(3)},
> + {"SPA_PGD0", BIT(4)},
> + {"SPB_PGD0", BIT(5)},
> + {"RSVD6", BIT(6)},
> + {"GBE_PGD0", BIT(7)},
> +
> + {"RSVD8", BIT(0)},
> + {"RSVD9", BIT(1)},
> + {"SBR16B7_PGD0", BIT(2)},
> + {"SBR16B21_PGD0", BIT(3)},
> + {"RSVD12", BIT(4)},
> + {"D2D_DISP_PGD1", BIT(5)},
> + {"LPSS_PGD0", BIT(6)},
> + {"LPC_PGD0", BIT(7)},
> +
> + {"SMB_PGD0", BIT(0)},
> + {"ISH_PGD0", BIT(1)},
> + {"SBR16B1_PGD0", BIT(2)},
> + {"NPK_PGD0", BIT(3)},
> + {"D2D_NOC_PGD1", BIT(4)},
> + {"DBG_SBR16B_PGD0", BIT(5)},
> + {"FUSE_PGD0", BIT(6)},
> + {"RSVD23", BIT(7)},
> +
> + {"P2SB0_PGD0", BIT(0)},
> + {"OTG_PGD0", BIT(1)},
> + {"EXI_PGD0", BIT(2)},
> + {"CSE_PGD0", BIT(3)},
> + {"CSME_KVM_PGD0", BIT(4)},
> + {"CSME_PMT_PGD0", BIT(5)},
> + {"CSME_CLINK_PGD0", BIT(6)},
> + {"CSME_PTIO_PGD0", BIT(7)},
> +
> + {"CSME_USBR_PGD0", BIT(0)},
> + {"SBR16B22_PGD0", BIT(1)},
> + {"CSME_SMT1_PGD0", BIT(2)},
> + {"P2SB1_PGD0", BIT(3)},
> + {"CSME_SMS2_PGD0", BIT(4)},
> + {"CSME_SMS_PGD0", BIT(5)},
> + {"CSME_RTC_PGD0", BIT(6)},
> + {"CSMEPSF_PGD0", BIT(7)},
> +
> + {"D2D_NOC_PGD0", BIT(0)},
> + {"RSVD41", BIT(1)},
> + {"RSVD42", BIT(2)},
> + {"RSVD43", BIT(3)},
> + {"SBR16B2_PGD0", BIT(4)},
> + {"OSSE_SMT1_PGD0", BIT(5)},
> + {"D2D_DISP_PGD0", BIT(6)},
> + {"RSVD47_PGD0", BIT(7)},
> +
> + {"RSVD48", BIT(0)},
> + {"DBG_PSF_PGD0", BIT(1)},
> + {"RSVD50", BIT(2)},
> + {"CNVI_PGD0", BIT(3)},
> + {"UFSX2_PGD0", BIT(4)},
> + {"ENDBG_PGD0", BIT(5)},
> + {"DBC_PGD0", BIT(6)},
> + {"SBR16B4_PGD0", BIT(7)},
> +
> + {"RSVD56", BIT(0)},
> + {"NPK_PGD1", BIT(1)},
> + {"RSVD58", BIT(2)},
> + {"SBR16B20_PGD0", BIT(3)},
> + {"RSVD60", BIT(4)},
> + {"SBR8B20_PGD0", BIT(5)},
> + {"RSVD62", BIT(6)},
> + {"FIA_U_PGD0", BIT(7)},
> +
> + {"PSF8_PGD0", BIT(0)},
> + {"RSVD65", BIT(1)},
> + {"RSVD66", BIT(2)},
> + {"FIACPCB_U_PGD0", BIT(3)},
> + {"TAM_PGD0", BIT(4)},
> + {"D2D_NOC_PGD2", BIT(5)},
> + {"SBR8B2_PGD0", BIT(6)},
> + {"THC0_PGD0", BIT(7)},
> +
> + {"THC1_PGD0", BIT(0)},
> + {"PMC_PGD1", BIT(1)},
> + {"SBR16B3_PGD0", BIT(2)},
> + {"TCSS_PGD0", BIT(3)},
> + {"DISP_PGA_PGD0", BIT(4)},
> + {"RSVD77", BIT(5)},
> + {"RSVD78", BIT(6)},
> + {"RSVD79", BIT(7)},
> +
> + {"SBRG_PGD0", BIT(0)},
> + {"RSVD81", BIT(1)},
> + {"SBR16B0_PGD0", BIT(2)},
> + {"SBR8B0_PGD0", BIT(3)},
> + {"PSF7_PGD0", BIT(4)},
> + {"RSVD85", BIT(5)},
> + {"RSVD86", BIT(6)},
> + {"RSVD87", BIT(7)},
> +
> + {"SBR16B6_PGD0", BIT(0)},
> + {"PSD0_PGD0", BIT(1)},
> + {"STRC_PGD0", BIT(2)},
> + {"RSVD91", BIT(3)},
> + {"DBG_SBR_PGD0", BIT(4)},
> + {"RSVD93", BIT(5)},
> + {"OSSE_PGD0", BIT(6)},
> + {"DISP_PGA1_PGD0", BIT(7)},
> + {}
> +};
> +
> +static const struct pmc_bit_map *ext_nvl_pcds_pfear_map[] = {
> + nvl_pcds_pfear_map,
> + NULL
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_ltr_show_map[] = {
> + {"SOUTHPORT_A", CNP_PMC_LTR_SPA},
> + {"SOUTHPORT_B", CNP_PMC_LTR_SPB},
> + {"SATA", CNP_PMC_LTR_SATA},
> + {"GIGABIT_ETHERNET", CNP_PMC_LTR_GBE},
> + {"XHCI", CNP_PMC_LTR_XHCI},
> + {"SOUTHPORT_F", ADL_PMC_LTR_SPF},
> + {"ME", CNP_PMC_LTR_ME},
> + {"SATA1", CNP_PMC_LTR_EVA},
> + {"SOUTHPORT_C", CNP_PMC_LTR_SPC},
> + {"HD_AUDIO", CNP_PMC_LTR_AZ},
> + {"CNV", CNP_PMC_LTR_CNV},
> + {"LPSS", CNP_PMC_LTR_LPSS},
> + {"SOUTHPORT_D", CNP_PMC_LTR_SPD},
> + {"SOUTHPORT_E", CNP_PMC_LTR_SPE},
> + {"SATA2", PTL_PMC_LTR_SATA2},
> + {"ESPI", CNP_PMC_LTR_ESPI},
> + {"SCC", CNP_PMC_LTR_SCC},
> + {"ISH", CNP_PMC_LTR_ISH},
> + {"UFSX2", CNP_PMC_LTR_UFSX2},
> + {"EMMC", CNP_PMC_LTR_EMMC},
> + {"WIGIG", ICL_PMC_LTR_WIGIG},
> + {"THC0", TGL_PMC_LTR_THC0},
> + {"THC1", TGL_PMC_LTR_THC1},
> + {"SOUTHPORT_G", MTL_PMC_LTR_SPG},
> + {"RSVD", NVL_PCDS_PMC_LTR_RESERVED},
> + {"IOE_PMC", MTL_PMC_LTR_IOE_PMC},
> + {"DMI3", ARL_PMC_LTR_DMI3},
> + {"OSSE", LNL_PMC_LTR_OSSE},
> +
> + /* Below two cannot be used for LTR_IGNORE */
> + {"CURRENT_PLATFORM", PTL_PMC_LTR_CUR_PLT},
> + {"AGGREGATED_SYSTEM", PTL_PMC_LTR_CUR_ASLT},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_clocksource_status_map[] = {
> + {"AON2_OFF_STS", BIT(0), 1},
> + {"AON3_OFF_STS", BIT(1), 0},
> + {"AON4_OFF_STS", BIT(2), 1},
> + {"AON5_OFF_STS", BIT(3), 1},
> + {"AON1_OFF_STS", BIT(4), 0},
> + {"XTAL_LVM_OFF_STS", BIT(5), 0},
> + {"D2D_OFF_STS", BIT(8), 1},
> + {"AON3_SPL_OFF_STS", BIT(9), 1},
> + {"XTAL_AGGR_OFF_STS", BIT(17), 1},
> + {"BCLK_EXT_INJ_OFF_STS", BIT(18), 1},
> + {"DDI2_PLL_OFF_STS", BIT(19), 1},
> + {"SE_TCSS_PLL_OFF_STS", BIT(20), 1},
> + {"DDI_PLL_OFF_STS", BIT(21), 1},
> + {"FILTER_PLL_OFF_STS", BIT(22), 1},
> + {"PHY_OC_EXT_INJ_OFF_STS", BIT(23), 1},
> + {"ACE_PLL_OFF_STS", BIT(24), 0},
> + {"FABRIC_PLL_OFF_STS", BIT(25), 1},
> + {"SOC_PLL_OFF_STS", BIT(26), 1},
> + {"REF_PLL_OFF_STS", BIT(28), 1},
> + {"GENLOCK_FILTER_PLL_OFF_STS", BIT(30), 1},
> + {"RTC_PLL_OFF_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_power_gating_status_0_map[] = {
> + {"PMC_PGD0_PG_STS", BIT(0), 0},
> + {"FUSE_OSSE_PGD0_PG_STS", BIT(1), 0},
> + {"ESPISPI_PGD0_PG_STS", BIT(2), 0},
> + {"XHCI_PGD0_PG_STS", BIT(3), 0},
> + {"SPA_PGD0_PG_STS", BIT(4), 0},
> + {"SPB_PGD0_PG_STS", BIT(5), 0},
> + {"RSVD_6", BIT(6), 0},
> + {"GBE_PGD0_PG_STS", BIT(7), 0},
> + {"RSVD_8", BIT(8), 0},
> + {"RSVD_9", BIT(9), 0},
> + {"SBR16B7_PGD0_PG_STS", BIT(10), 0},
> + {"SBR16B21_PGD0_PG_STS", BIT(11), 0},
> + {"RSVD_12", BIT(12), 0},
> + {"D2D_DISP_PGD1_PG_STS", BIT(13), 1},
> + {"LPSS_PGD0_PG_STS", BIT(14), 0},
> + {"LPC_PGD0_PG_STS", BIT(15), 0},
> + {"SMB_PGD0_PG_STS", BIT(16), 0},
> + {"ISH_PGD0_PG_STS", BIT(17), 0},
> + {"SBR16B1_PGD0_PG_STS", BIT(18), 0},
> + {"NPK_PGD0_PG_STS", BIT(19), 0},
> + {"D2D_NOC_PGD1_PG_STS", BIT(20), 1},
> + {"DBG_SBR16B_PGD0_PG_STS", BIT(21), 0},
> + {"FUSE_PGD0_PG_STS", BIT(22), 0},
> + {"RSVD_23", BIT(23), 0},
> + {"P2SB0_PGD0_PG_STS", BIT(24), 1},
> + {"XDCI_PGD0_PG_STS", BIT(25), 0},
> + {"EXI_PGD0_PG_STS", BIT(26), 0},
> + {"CSE_PGD0_PG_STS", BIT(27), 1},
> + {"KVMCC_PGD0_PG_STS", BIT(28), 0},
> + {"PMT_PGD0_PG_STS", BIT(29), 0},
> + {"CLINK_PGD0_PG_STS", BIT(30), 0},
> + {"PTIO_PGD0_PG_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_power_gating_status_1_map[] = {
> + {"USBR0_PGD0_PG_STS", BIT(0), 0},
> + {"SBR16B22_PGD0_PG_STS", BIT(1), 0},
> + {"SMT1_PGD0_PG_STS", BIT(2), 0},
> + {"P2SB1_PGD0_PG_STS", BIT(3), 1},
> + {"SMS2_PGD0_PG_STS", BIT(4), 0},
> + {"SMS1_PGD0_PG_STS", BIT(5), 0},
> + {"CSMERTC_PGD0_PG_STS", BIT(6), 0},
> + {"CSMEPSF_PGD0_PG_STS", BIT(7), 0},
> + {"D2D_NOC_PGD0_PG_STS", BIT(8), 0},
> + {"RSVD_9", BIT(9), 0},
> + {"RSVD_10", BIT(10), 0},
> + {"RSVD_11", BIT(11), 0},
> + {"SBR16B2_PGD0_PG_STS", BIT(12), 0},
> + {"OSSE_SMT1_PGD0_PG_STS", BIT(13), 1},
> + {"D2D_DISP_PGD0_PG_STS", BIT(14), 1},
> + {"RSVD_15", BIT(15), 0},
> + {"RSVD_16", BIT(16), 0},
> + {"DBG_PSF_PGD0_PG_STS", BIT(17), 0},
> + {"RSVD_18", BIT(18), 0},
> + {"CNVI_PGD0_PG_STS", BIT(19), 0},
> + {"UFSX2_PGD0_PG_STS", BIT(20), 0},
> + {"ENDBG_PGD0_PG_STS", BIT(21), 0},
> + {"DBC_PGD0_PG_STS", BIT(22), 0},
> + {"SBR16B4_PGD0_PG_STS", BIT(23), 0},
> + {"RSVD_24", BIT(24), 0},
> + {"NPK_PGD1_PG_STS", BIT(25), 0},
> + {"RSVD_26", BIT(26), 0},
> + {"SBR16B20_PGD0_PG_STS", BIT(27), 0},
> + {"RSVD_28", BIT(28), 0},
> + {"SBR8B20_PGD0_PG_STS", BIT(29), 0},
> + {"RSVD_30", BIT(30), 0},
> + {"FIA_U_PGD0_PG_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_power_gating_status_2_map[] = {
> + {"PSF8_PGD0_PG_STS", BIT(0), 0},
> + {"RSVD_1", BIT(1), 0},
> + {"RSVD_2", BIT(2), 0},
> + {"FIACPCB_U_PGD0_PG_STS", BIT(3), 0},
> + {"TAM_PGD0_PG_STS", BIT(4), 1},
> + {"D2D_NOC_PGD2_PG_STS", BIT(5), 1},
> + {"SBR8B2_PGD0_PG_STS", BIT(6), 0},
> + {"THC0_PGD0_PG_STS", BIT(7), 0},
> + {"THC1_PGD0_PG_STS", BIT(8), 0},
> + {"PMC_PGD1_PG_STS", BIT(9), 0},
> + {"SBR16B3_PGD0_PG_STS", BIT(10), 0},
> + {"TCSS_PGD0_PG_STS", BIT(11), 0},
> + {"DISP_PGA_PGD0_PG_STS", BIT(12), 0},
> + {"RSVD_13", BIT(13), 0},
> + {"RSVD_14", BIT(14), 0},
> + {"RSVD_15", BIT(15), 0},
> + {"SBRG_PGD0_PG_STS", BIT(16), 0},
> + {"RSVD_17", BIT(17), 0},
> + {"SBR16B0_PGD0_PG_STS", BIT(18), 0},
> + {"SBR8B0_PGD0_PG_STS", BIT(19), 0},
> + {"PSF7_PGD0_PG_STS", BIT(20), 0},
> + {"RSVD_21", BIT(21), 0},
> + {"RSVD_22", BIT(22), 0},
> + {"RSVD_23", BIT(23), 0},
> + {"SBR16B6_PGD0_PG_STS", BIT(24), 0},
> + {"PSF0_PGD0_PG_STS", BIT(25), 0},
> + {"STRC_PGD0_PG_STS", BIT(26), 0},
> + {"RSVD_27", BIT(27), 0},
> + {"DBG_SBR_PGD0_PG_STS", BIT(28), 0},
> + {"RSVD_29", BIT(29), 0},
> + {"OSSE_PGD0_PG_STS", BIT(30), 1},
> + {"DISP_PGA1_PGD0_PG_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_d3_status_0_map[] = {
> + {"LPSS_D3_STS", BIT(3), 1},
> + {"XDCI_D3_STS", BIT(4), 1},
> + {"XHCI_D3_STS", BIT(5), 1},
> + {"SPA_D3_STS", BIT(12), 0},
> + {"SPB_D3_STS", BIT(13), 0},
> + {"ESPISPI_D3_STS", BIT(18), 0},
> + {"PSTH_D3_STS", BIT(21), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_d3_status_1_map[] = {
> + {"OSSE_D3_STS", BIT(14), 0},
> + {"GBE_D3_STS", BIT(19), 0},
> + {"ITSS_D3_STS", BIT(23), 0},
> + {"CNVI_D3_STS", BIT(27), 0},
> + {"UFSX2_D3_STS", BIT(28), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_d3_status_2_map[] = {
> + {"CSMERTC_D3_STS", BIT(1), 0},
> + {"CSE_D3_STS", BIT(4), 0},
> + {"KVMCC_D3_STS", BIT(5), 0},
> + {"USBR0_D3_STS", BIT(6), 0},
> + {"ISH_D3_STS", BIT(7), 0},
> + {"SMT1_D3_STS", BIT(8), 0},
> + {"SMT2_D3_STS", BIT(9), 0},
> + {"SMT3_D3_STS", BIT(10), 0},
> + {"OSSE_SMT1_D3_STS", BIT(12), 0},
> + {"CLINK_D3_STS", BIT(14), 0},
> + {"PTIO_D3_STS", BIT(16), 0},
> + {"PMT_D3_STS", BIT(17), 0},
> + {"SMS1_D3_STS", BIT(18), 0},
> + {"SMS2_D3_STS", BIT(19), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_d3_status_3_map[] = {
> + {"OSSE_SMT2_D3_STS", BIT(0), 0},
> + {"THC0_D3_STS", BIT(14), 1},
> + {"THC1_D3_STS", BIT(15), 1},
> + {"OSSE_SMT3_D3_STS", BIT(19), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_vnn_req_status_0_map[] = {
> + {"LPSS_VNN_REQ_STS", BIT(3), 0},
> + {"ESPISPI_VNN_REQ_STS", BIT(18), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_vnn_req_status_1_map[] = {
> + {"NPK_VNN_REQ_STS", BIT(4), 1},
> + {"DFXAGG_VNN_REQ_STS", BIT(8), 0},
> + {"EXI_VNN_REQ_STS", BIT(9), 1},
> + {"OSSE_VNN_REQ_STS", BIT(14), 1},
> + {"P2D_VNN_REQ_STS", BIT(18), 1},
> + {"GBE_VNN_REQ_STS", BIT(19), 0},
> + {"SMB_VNN_REQ_STS", BIT(25), 1},
> + {"LPC_VNN_REQ_STS", BIT(26), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_vnn_req_status_2_map[] = {
> + {"CSMERTC_VNN_REQ_STS", BIT(1), 0},
> + {"CSE_VNN_REQ_STS", BIT(4), 1},
> + {"ISH_VNN_REQ_STS", BIT(7), 0},
> + {"SMT1_VNN_REQ_STS", BIT(8), 0},
> + {"OSSE_SMT1_VNN_REQ_STS", BIT(12), 1},
> + {"CLINK_VNN_REQ_STS", BIT(14), 0},
> + {"SMS1_VNN_REQ_STS", BIT(18), 0},
> + {"SMS2_VNN_REQ_STS", BIT(19), 0},
> + {"GPIOCOM4_VNN_REQ_STS", BIT(20), 0},
> + {"GPIOCOM3_VNN_REQ_STS", BIT(21), 1},
> + {"GPIOCOM1_VNN_REQ_STS", BIT(23), 1},
> + {"GPIOCOM0_VNN_REQ_STS", BIT(24), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_vnn_req_status_3_map[] = {
> + {"DISP_SHIM_VNN_REQ_STS", BIT(4), 1},
> + {"DTS0_VNN_REQ_STS", BIT(7), 0},
> + {"GPIOCOM5_VNN_REQ_STS", BIT(11), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_vnn_misc_status_map[] = {
> + {"CPU_C10_REQ_STS", BIT(0), 0},
> + {"TS_OFF_REQ_STS", BIT(1), 0},
> + {"PNDE_MET_REQ_STS", BIT(2), 1},
> + {"PG5_PMA0_REQ_STS", BIT(3), 1},
> + {"FW_THROTTLE_ALLOWED_REQ_STS", BIT(4), 0},
> + {"VNN_SOC_REQ_STS", BIT(6), 1},
> + {"ISH_VNNAON_REQ_STS", BIT(7), 0},
> + {"D2D_NOC_CFI_QACTIVE_REQ_STS", BIT(8), 1},
> + {"D2D_NOC_GPSB_QACTIVE_REQ_STS", BIT(9), 1},
> + {"PLT_GREATER_REQ_STS", BIT(11), 1},
> + {"ALL_SBR_IDLE_REQ_STS", BIT(12), 0},
> + {"PMC_IDLE_FB_OCP_REQ_STS", BIT(13), 0},
> + {"PM_SYNC_STATES_REQ_STS", BIT(14), 0},
> + {"EA_REQ_STS", BIT(15), 0},
> + {"MPHY_CORE_OFF_REQ_STS", BIT(16), 0},
> + {"BRK_EV_EN_REQ_STS", BIT(17), 0},
> + {"AUTO_DEMO_EN_REQ_STS", BIT(18), 0},
> + {"ITSS_CLK_SRC_REQ_STS", BIT(19), 1},
> + {"ARC_IDLE_REQ_STS", BIT(21), 0},
> + {"PG5_PMA1_REQ_STS", BIT(22), 1},
> + {"DG5_PMA0_REQ_STS", BIT(23), 1},
> + {"ARC_INTERRUPT_WAKE_REQ_STS", BIT(25), 0},
> + {"D2D_DISP_DDI_QACTIVE_REQ_STS", BIT(26), 1},
> + {"PRE_WAKE0_REQ_STS", BIT(27), 1},
> + {"PRE_WAKE1_REQ_STS", BIT(28), 1},
> + {"PRE_WAKE2_REQ_STS", BIT(29), 1},
> + {"D2D_DISP_EDP_QACTIVE_REQ_STS", BIT(31), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_rsc_status_map[] = {
> + {"CORE", 0, 1},
> + {"Memory", 0, 1},
> + {"PRIM_D2D", 0, 1},
> + {"PSF0", 0, 1},
> + {"SB", 0, 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pcds_signal_status_map[] = {
> + {"LSX_Wake0_STS", BIT(0), 0},
> + {"LSX_Wake1_STS", BIT(1), 0},
> + {"LSX_Wake2_STS", BIT(2), 0},
> + {"LSX_Wake3_STS", BIT(3), 0},
> + {"LSX_Wake4_STS", BIT(4), 0},
> + {"LSX_Wake5_STS", BIT(5), 0},
> + {"LSX_Wake6_STS", BIT(6), 0},
> + {"LSX_Wake7_STS", BIT(7), 0},
> + {"LPSS_Wake0_STS", BIT(8), 1},
> + {"LPSS_Wake1_STS", BIT(9), 1},
> + {"Int_Timer_SS_Wake0_STS", BIT(10), 1},
> + {"Int_Timer_SS_Wake1_STS", BIT(11), 1},
> + {"Int_Timer_SS_Wake2_STS", BIT(12), 1},
> + {"Int_Timer_SS_Wake3_STS", BIT(13), 1},
> + {"Int_Timer_SS_Wake4_STS", BIT(14), 1},
> + {"Int_Timer_SS_Wake5_STS", BIT(15), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map *nvl_pcds_lpm_maps[] = {
> + nvl_pcds_clocksource_status_map,
> + nvl_pcds_power_gating_status_0_map,
> + nvl_pcds_power_gating_status_1_map,
> + nvl_pcds_power_gating_status_2_map,
> + nvl_pcds_d3_status_0_map,
> + nvl_pcds_d3_status_1_map,
> + nvl_pcds_d3_status_2_map,
> + nvl_pcds_d3_status_3_map,
> + nvl_pcds_vnn_req_status_0_map,
> + nvl_pcds_vnn_req_status_1_map,
> + nvl_pcds_vnn_req_status_2_map,
> + nvl_pcds_vnn_req_status_3_map,
> + nvl_pcds_vnn_misc_status_map,
> + nvl_pcds_signal_status_map,
> + NULL
> +};
> +
> +static const struct pmc_bit_map *nvl_pcds_blk_maps[] = {
> + nvl_pcds_power_gating_status_0_map,
> + nvl_pcds_power_gating_status_1_map,
> + nvl_pcds_power_gating_status_2_map,
> + nvl_pcds_rsc_status_map,
> + nvl_pcds_vnn_req_status_0_map,
> + nvl_pcds_vnn_req_status_1_map,
> + nvl_pcds_vnn_req_status_2_map,
> + nvl_pcds_vnn_req_status_3_map,
> + nvl_pcds_d3_status_0_map,
> + nvl_pcds_d3_status_1_map,
> + nvl_pcds_d3_status_2_map,
> + nvl_pcds_d3_status_3_map,
> + nvl_pcds_clocksource_status_map,
> + nvl_pcds_vnn_misc_status_map,
> + nvl_pcds_signal_status_map,
> + NULL
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_pfear_map[] = {
> + {"PMC_PGD0", BIT(0)},
> + {"FIA_D_PGD0", BIT(1)},
> + {"SPI_PGD0", BIT(2)},
> + {"XHCI_PGD0", BIT(3)},
> + {"SPA_PGD0", BIT(4)},
> + {"SPB_PGD0", BIT(5)},
> + {"MPFPW2_PGD0", BIT(6)},
> + {"GBE_PGD0", BIT(7)},
> +
> + {"RSVD8", BIT(0)},
> + {"PSF3_PGD0", BIT(1)},
> + {"SBR5_PGD0", BIT(2)},
> + {"SBR0_PGD0", BIT(3)},
> + {"RSVD12", BIT(4)},
> + {"D2D_DISP_PGD1", BIT(5)},
> + {"LPSS_PGD0", BIT(6)},
> + {"LPC_PGD0", BIT(7)},
> +
> + {"SMB_PGD0", BIT(0)},
> + {"ISH_PGD0", BIT(1)},
> + {"P2SB_PGD0", BIT(2)},
> + {"NPK_PGD0", BIT(3)},
> + {"D2D_NOC_PGD1", BIT(4)},
> + {"EAH_PGD0", BIT(5)},
> + {"FUSE_PGD0", BIT(6)},
> + {"SBR8_PGD0", BIT(7)},
> +
> + {"PSF7_PGD0", BIT(0)},
> + {"OTG_PGD0", BIT(1)},
> + {"EXI_PGD0", BIT(2)},
> + {"CSE_PGD0", BIT(3)},
> + {"CSME_KVM_PGD0", BIT(4)},
> + {"CSME_PMT_PGD0", BIT(5)},
> + {"CSME_CLINK_PGD0", BIT(6)},
> + {"CSME_PTIO_PGD0", BIT(7)},
> +
> + {"CSME_USBR_PGD0", BIT(0)},
> + {"SBR1_PGD0", BIT(1)},
> + {"CSME_SMT1_PGD0", BIT(2)},
> + {"MPFPW1_PGD0", BIT(3)},
> + {"CSME_SMS2_PGD0", BIT(4)},
> + {"CSME_SMS_PGD0", BIT(5)},
> + {"CSME_RTC_PGD0", BIT(6)},
> + {"CSMEPSF_PGD0", BIT(7)},
> +
> + {"D2D_NOC_PGD0", BIT(0)},
> + {"ESE_PGD0", BIT(1)},
> + {"SBR2_PGD0", BIT(2)},
> + {"SBR3_PGD0", BIT(3)},
> + {"SBR4_PGD0", BIT(4)},
> + {"RSVD45", BIT(5)},
> + {"D2D_DISP_PGD0", BIT(6)},
> + {"PSF1_PGD0", BIT(7)},
> +
> + {"U3FPW1_PGD0", BIT(0)},
> + {"DMI3FPW_PGD0", BIT(1)},
> + {"PSF4_PGD0", BIT(2)},
> + {"CNVI_PGD0", BIT(3)},
> + {"RSVD52", BIT(4)},
> + {"ENDBG_PGD0", BIT(5)},
> + {"DBC_PGD0", BIT(6)},
> + {"SMT4_PGD0", BIT(7)},
> +
> + {"RSVD56", BIT(0)},
> + {"NPK_PGD1", BIT(1)},
> + {"RSVD58", BIT(2)},
> + {"DMI3_PGD0", BIT(3)},
> + {"RSVD60", BIT(4)},
> + {"FIACPCB_D_PGD0", BIT(5)},
> + {"RSVD62", BIT(6)},
> + {"FIA_U_PGD0", BIT(7)},
> +
> + {"FIACPCB_PGS_PGD0", BIT(0)},
> + {"FIA_PGS_PGD0", BIT(1)},
> + {"RSVD66", BIT(2)},
> + {"FIACPCB_U_PGD0", BIT(3)},
> + {"TAM_PGD0", BIT(4)},
> + {"D2D_NOC_PGD2", BIT(5)},
> + {"PSF2_PGD0", BIT(6)},
> + {"THC0_PGD0", BIT(7)},
> +
> + {"THC1_PGD0", BIT(0)},
> + {"PMC_PGD1", BIT(1)},
> + {"SBR9_PGD0", BIT(2)},
> + {"U3FPW2_PGD0", BIT(3)},
> + {"RSVD76", BIT(4)},
> + {"DBG_PSF_PGD0", BIT(5)},
> + {"DBG_SBR_PGD0", BIT(6)},
> + {"SBR6_PGD0", BIT(7)},
> +
> + {"SPC_PGD0", BIT(0)},
> + {"ACE_PGD0", BIT(1)},
> + {"ACE_PGD1", BIT(2)},
> + {"ACE_PGD2", BIT(3)},
> + {"ACE_PGD3", BIT(4)},
> + {"ACE_PGD4", BIT(5)},
> + {"ACE_PGD5", BIT(6)},
> + {"ACE_PGD6", BIT(7)},
> +
> + {"ACE_PGD7", BIT(0)},
> + {"ACE_PGD8", BIT(1)},
> + {"ACE_PGD9", BIT(2)},
> + {"ACE_PGD10", BIT(3)},
> + {"U3FPW3_PGD0", BIT(4)},
> + {"SBR7_PGD0", BIT(5)},
> + {"OSSE_PGD0", BIT(6)},
> + {"ST_PGD0", BIT(7)},
> + {}
> +};
> +
> +static const struct pmc_bit_map *ext_nvl_pchs_pfear_map[] = {
> + nvl_pchs_pfear_map,
> + NULL
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_clocksource_status_map[] = {
> + {"AON2_OFF_STS", BIT(0), 1},
> + {"AON3_OFF_STS", BIT(1), 0},
> + {"AON4_OFF_STS", BIT(2), 0},
> + {"AON2_SPL_OFF_STS", BIT(3), 0},
> + {"AONL_OFF_STS", BIT(4), 0},
> + {"XTAL_LVM_OFF_STS", BIT(5), 0},
> + {"AON5_OFF_STS", BIT(6), 0},
> + {"USB3_PLL_OFF_STS", BIT(8), 1},
> + {"MAIN_CRO_OFF_STS", BIT(11), 0},
> + {"MAIN_DIVIDER_OFF_STS", BIT(12), 1},
> + {"REF_PLL_NON_OC_OFF_STS", BIT(13), 1},
> + {"DMI_PLL_OFF_STS", BIT(14), 1},
> + {"PHY_EXT_INJ_OFF_STS", BIT(15), 1},
> + {"AON6_MCRO_OFF_STS", BIT(16), 0},
> + {"XTAL_AGGR_OFF_STS", BIT(17), 0},
> + {"USB2_PLL_OFF_STS", BIT(18), 1},
> + {"GBE_PLL_OFF_STS", BIT(21), 1},
> + {"SATA_PLL_OFF_STS", BIT(22), 1},
> + {"PCIE0_PLL_OFF_STS", BIT(23), 1},
> + {"PCIE1_PLL_OFF_STS", BIT(24), 1},
> + {"FABRIC_PLL_OFF_STS", BIT(25), 1},
> + {"PCIE2_PLL_OFF_STS", BIT(26), 1},
> + {"REF_PLL_OFF_STS", BIT(28), 1},
> + {"REF38P4_PLL_OFF_STS", BIT(31), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_power_gating_status_0_map[] = {
> + {"PMC_PGD0_PG_STS", BIT(0), 0},
> + {"FIA_D_PGD0_PG_STS", BIT(1), 0},
> + {"ESPISPI_PGD0_PG_STS", BIT(2), 0},
> + {"XHCI_PGD0_PG_STS", BIT(3), 0},
> + {"SPA_PGD0_PG_STS", BIT(4), 1},
> + {"SPB_PGD0_PG_STS", BIT(5), 1},
> + {"MPFPW2_PGD0_PG_STS", BIT(6), 0},
> + {"GBE_PGD0_PG_STS", BIT(7), 1},
> + {"RSVD_8", BIT(8), 0},
> + {"PSF3_PGD0_PG_STS", BIT(9), 0},
> + {"SBR5_PGD0_PG_STS", BIT(10), 0},
> + {"SBR0_PGD0_PG_STS", BIT(11), 0},
> + {"RSVD_12", BIT(12), 0},
> + {"D2D_DISP_PGD1_PG_STS", BIT(13), 0},
> + {"LPSS_PGD0_PG_STS", BIT(14), 1},
> + {"LPC_PGD0_PG_STS", BIT(15), 0},
> + {"SMB_PGD0_PG_STS", BIT(16), 0},
> + {"ISH_PGD0_PG_STS", BIT(17), 0},
> + {"P2S_PGD0_PG_STS", BIT(18), 0},
> + {"NPK_PGD0_PG_STS", BIT(19), 0},
> + {"D2D_NOC_PGD1_PG_STS", BIT(20), 0},
> + {"EAH_PGD0_PG_STS", BIT(21), 0},
> + {"FUSE_PGD0_PG_STS", BIT(22), 0},
> + {"SBR8_PGD0_PG_STS", BIT(23), 0},
> + {"PSF7_PGD0_PG_STS", BIT(24), 0},
> + {"XDCI_PGD0_PG_STS", BIT(25), 1},
> + {"EXI_PGD0_PG_STS", BIT(26), 0},
> + {"CSE_PGD0_PG_STS", BIT(27), 1},
> + {"KVMCC_PGD0_PG_STS", BIT(28), 1},
> + {"PMT_PGD0_PG_STS", BIT(29), 1},
> + {"CLINK_PGD0_PG_STS", BIT(30), 1},
> + {"PTIO_PGD0_PG_STS", BIT(31), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_power_gating_status_1_map[] = {
> + {"USBR0_PGD0_PG_STS", BIT(0), 1},
> + {"SBR1_PGD0_PG_STS", BIT(1), 0},
> + {"SMT1_PGD0_PG_STS", BIT(2), 1},
> + {"MPFPW1_PGD0_PG_STS", BIT(3), 0},
> + {"SMS2_PGD0_PG_STS", BIT(4), 1},
> + {"SMS1_PGD0_PG_STS", BIT(5), 1},
> + {"CSMERTC_PGD0_PG_STS", BIT(6), 0},
> + {"CSMEPSF_PGD0_PG_STS", BIT(7), 0},
> + {"D2D_NOC_PGD0_PG_STS", BIT(8), 0},
> + {"ESE_PGD0_PG_STS", BIT(9), 1},
> + {"SBR2_PGD0_PG_STS", BIT(10), 0},
> + {"SBR3_PGD0_PG_STS", BIT(11), 0},
> + {"SBR4_PGD0_PG_STS", BIT(12), 0},
> + {"RSVD_13", BIT(13), 0},
> + {"D2D_DISP_PGD0_PG_STS", BIT(14), 0},
> + {"PSF1_PGD0_PG_STS", BIT(15), 0},
> + {"U3FPW1_PGD0_PG_STS", BIT(16), 0},
> + {"DMI3FPW_PGD0_PG_STS", BIT(17), 0},
> + {"PSF4_PGD0_PG_STS", BIT(18), 0},
> + {"CNVI_PGD0_PG_STS", BIT(19), 0},
> + {"RSVD_20", BIT(20), 0},
> + {"ENDBG_PGD0_PG_STS", BIT(21), 0},
> + {"DBC_PGD0_PG_STS", BIT(22), 0},
> + {"SMT4_PGD0_PG_STS", BIT(23), 1},
> + {"RSVD_24", BIT(24), 0},
> + {"NPK_PGD1_PG_STS", BIT(25), 0},
> + {"RSVD_26", BIT(26), 0},
> + {"DMI3_PGD0_PG_STS", BIT(27), 1},
> + {"RSVD_28", BIT(28), 0},
> + {"FIACPCB_D_PGD0_PG_STS", BIT(29), 0},
> + {"RSVD_30", BIT(30), 0},
> + {"FIA_U_PGD0_PG_STS", BIT(31), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_power_gating_status_2_map[] = {
> + {"FIACPCB_PGS_PGD0_PG_STS", BIT(0), 0},
> + {"FIA_PGS_PGD0_PG_STS", BIT(1), 0},
> + {"RSVD_2", BIT(2), 0},
> + {"FIACPCB_U_PGD0_PG_STS", BIT(3), 0},
> + {"TAM_PGD0_PG_STS", BIT(4), 0},
> + {"D2D_NOC_PGD2_PG_STS", BIT(5), 0},
> + {"PSF2_PGD0_PG_STS", BIT(6), 0},
> + {"THC0_PGD0_PG_STS", BIT(7), 1},
> + {"THC1_PGD0_PG_STS", BIT(8), 1},
> + {"PMC_PGD1_PG_STS", BIT(9), 0},
> + {"SBR9_PGA0_PGD0_PG_STS", BIT(10), 0},
> + {"U3FPW2_PGD0_PG_STS", BIT(11), 0},
> + {"RSVD_12", BIT(12), 0},
> + {"DBG_PSF_PGD0_PG_STS", BIT(13), 0},
> + {"DBG_SBR_PGD0_PG_STS", BIT(14), 0},
> + {"SBR6_PGD0_PG_STS", BIT(15), 0},
> + {"SPC_PGD0_PG_STS", BIT(16), 1},
> + {"ACE_PGD0_PG_STS", BIT(17), 0},
> + {"ACE_PGD1_PG_STS", BIT(18), 0},
> + {"ACE_PGD2_PG_STS", BIT(19), 0},
> + {"ACE_PGD3_PG_STS", BIT(20), 0},
> + {"ACE_PGD4_PG_STS", BIT(21), 0},
> + {"ACE_PGD5_PG_STS", BIT(22), 0},
> + {"ACE_PGD6_PG_STS", BIT(23), 0},
> + {"ACE_PGD7_PG_STS", BIT(24), 0},
> + {"ACE_PGD8_PG_STS", BIT(25), 0},
> + {"ACE_PGD9_PG_STS", BIT(26), 0},
> + {"ACE_PGD10_PG_STS", BIT(27), 0},
> + {"U3FPW3_PGD0_PG_STS", BIT(28), 0},
> + {"SBR7_PGD0_PG_STS", BIT(29), 0},
> + {"OSSE_PGD0_PG_STS", BIT(30), 0},
> + {"SATA_PGD0_PG_STS", BIT(31), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_d3_status_0_map[] = {
> + {"LPSS_D3_STS", BIT(3), 1},
> + {"XDCI_D3_STS", BIT(4), 1},
> + {"XHCI_D3_STS", BIT(5), 0},
> + {"SPA_D3_STS", BIT(12), 0},
> + {"SPB_D3_STS", BIT(13), 0},
> + {"SPC_D3_STS", BIT(14), 0},
> + {"ESPISPI_D3_STS", BIT(18), 0},
> + {"SATA_D3_STS", BIT(20), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_d3_status_1_map[] = {
> + {"OSSE_D3_STS", BIT(6), 0},
> + {"GBE_D3_STS", BIT(19), 0},
> + {"ITSS_D3_STS", BIT(23), 0},
> + {"P2S_D3_STS", BIT(24), 0},
> + {"CNVI_D3_STS", BIT(27), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_d3_status_2_map[] = {
> + {"CSMERTC_D3_STS", BIT(1), 0},
> + {"CSE_D3_STS", BIT(4), 0},
> + {"KVMCC_D3_STS", BIT(5), 0},
> + {"USBR0_D3_STS", BIT(6), 0},
> + {"ISH_D3_STS", BIT(7), 0},
> + {"SMT1_D3_STS", BIT(8), 0},
> + {"SMT2_D3_STS", BIT(9), 0},
> + {"SMT3_D3_STS", BIT(10), 0},
> + {"SMT4_D3_STS", BIT(11), 0},
> + {"SMT5_D3_STS", BIT(12), 0},
> + {"SMT6_D3_STS", BIT(13), 0},
> + {"CLINK_D3_STS", BIT(14), 0},
> + {"PTIO_D3_STS", BIT(16), 0},
> + {"PMT_D3_STS", BIT(17), 0},
> + {"SMS1_D3_STS", BIT(18), 0},
> + {"SMS2_D3_STS", BIT(19), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_d3_status_3_map[] = {
> + {"THC0_D3_STS", BIT(14), 0},
> + {"THC1_D3_STS", BIT(15), 0},
> + {"ACE_D3_STS", BIT(23), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_vnn_req_status_1_map[] = {
> + {"NPK_VNN_REQ_STS", BIT(4), 0},
> + {"OSSE_VNN_REQ_STS", BIT(6), 0},
> + {"DFXAGG_VNN_REQ_STS", BIT(8), 0},
> + {"EXI_VNN_REQ_STS", BIT(9), 0},
> + {"GBE_VNN_REQ_STS", BIT(19), 0},
> + {"SMB_VNN_REQ_STS", BIT(25), 0},
> + {"LPC_VNN_REQ_STS", BIT(26), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_vnn_req_status_2_map[] = {
> + {"CSMERTC_VNN_REQ_STS", BIT(1), 0},
> + {"CSE_VNN_REQ_STS", BIT(4), 0},
> + {"ISH_VNN_REQ_STS", BIT(7), 0},
> + {"SMT1_VNN_REQ_STS", BIT(8), 0},
> + {"SMT4_VNN_REQ_STS", BIT(11), 0},
> + {"CLINK_VNN_REQ_STS", BIT(14), 0},
> + {"SMS1_VNN_REQ_STS", BIT(18), 0},
> + {"SMS2_VNN_REQ_STS", BIT(19), 0},
> + {"GPIOCOM4_VNN_REQ_STS", BIT(20), 0},
> + {"GPIOCOM3_VNN_REQ_STS", BIT(21), 0},
> + {"GPIOCOM2_VNN_REQ_STS", BIT(22), 0},
> + {"GPIOCOM1_VNN_REQ_STS", BIT(23), 0},
> + {"GPIOCOM0_VNN_REQ_STS", BIT(24), 0},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_vnn_misc_status_map[] = {
> + {"CPU_C10_REQ_STS", BIT(0), 0},
> + {"TS_OFF_REQ_STS", BIT(1), 0},
> + {"PNDE_MET_REQ_STS", BIT(2), 1},
> + {"PG5_PMA0_GVNN_REQ_STS", BIT(3), 1},
> + {"FW_THROTTLE_ALLOWED_REQ_STS", BIT(4), 0},
> + {"DMI_IN_L1_REQ_STS", BIT(6), 0},
> + {"ISH_VNNAON_REQ_STS", BIT(7), 0},
> + {"PLT_GREATER_REQ_STS", BIT(11), 1},
> + {"ALL_SBR_IDLE_REQ_STS", BIT(12), 0},
> + {"PMC_IDLE_FB_OCP_REQ_STS", BIT(13), 0},
> + {"PM_SYNC_STATES_REQ_STS", BIT(14), 0},
> + {"EA_REQ_STS", BIT(15), 0},
> + {"DMI_CLKREQ_B_REQ_STS", BIT(16), 0},
> + {"BRK_EV_EN_REQ_STS", BIT(17), 0},
> + {"AUTO_DEMO_EN_REQ_STS", BIT(18), 0},
> + {"ITSS_CLK_SRC_REQ_STS", BIT(19), 1},
> + {"ARC_IDLE_REQ_STS", BIT(21), 0},
> + {"PG5_PMA1_GVNN_REQ_STS", BIT(22), 1},
> + {"FIA_DEEP_PM_REQ_STS", BIT(23), 0},
> + {"XDCI_ATTACHED_REQ_STS", BIT(24), 0},
> + {"ARC_INTERRUPT_WAKE_REQ_STS", BIT(25), 0},
> + {"PRE_WAKE0_REQ_STS", BIT(27), 1},
> + {"PRE_WAKE1_REQ_STS", BIT(28), 1},
> + {"PRE_WAKE2_EN_REQ_STS", BIT(29), 0},
> + {"PG5_PMA2_GVNN_REQ_STS", BIT(30), 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map nvl_pchs_rsc_status_map[] = {
> + {"Memory", 0, 1},
> + {"Memory_NS", 0, 1},
> + {"PSF1", 0, 1},
> + {"PSF2", 0, 1},
> + {"PSF3", 0, 1},
> + {"REF_PLL", 0, 1},
> + {"SB", 0, 1},
> + {}
> +};
> +
> +static const struct pmc_bit_map *nvl_pchs_lpm_maps[] = {
> + nvl_pchs_clocksource_status_map,
> + nvl_pchs_power_gating_status_0_map,
> + nvl_pchs_power_gating_status_1_map,
> + nvl_pchs_power_gating_status_2_map,
> + nvl_pchs_d3_status_0_map,
> + nvl_pchs_d3_status_1_map,
> + nvl_pchs_d3_status_2_map,
> + nvl_pchs_d3_status_3_map,
> + nvl_pcds_vnn_req_status_0_map,
> + nvl_pchs_vnn_req_status_1_map,
> + nvl_pchs_vnn_req_status_2_map,
> + nvl_pcdh_vnn_req_status_3_map,
> + nvl_pchs_vnn_misc_status_map,
> + ptl_pcdp_signal_status_map,
> + NULL
> +};
> +
> +static const struct pmc_bit_map *nvl_pchs_blk_maps[] = {
> + nvl_pchs_power_gating_status_0_map,
> + nvl_pchs_power_gating_status_1_map,
> + nvl_pchs_power_gating_status_2_map,
> + nvl_pchs_rsc_status_map,
> + nvl_pchs_d3_status_0_map,
> + nvl_pchs_clocksource_status_map,
> + nvl_pchs_vnn_misc_status_map,
> + NULL
> +};
> +
> +static const struct pmc_reg_map nvl_pcdh_reg_map = {
> + .pfear_sts = ext_nvl_pcdh_pfear_map,
> + .slp_s0_offset = CNP_PMC_SLP_S0_RES_COUNTER_OFFSET,
> + .slp_s0_res_counter_step = TGL_PMC_SLP_S0_RES_COUNTER_STEP,
> + .ltr_show_sts = ptl_pcdp_ltr_show_map,
> + .msr_sts = msr_map,
> + .ltr_ignore_offset = CNP_PMC_LTR_IGNORE_OFFSET,
> + .regmap_length = NVL_PCDH_PMC_MMIO_REG_LEN,
> + .ppfear0_offset = CNP_PMC_HOST_PPFEAR0A,
> + .ppfear_buckets = NVL_PCDH_PPFEAR_NUM_ENTRIES,
> + .pm_cfg_offset = CNP_PMC_PM_CFG_OFFSET,
> + .pm_read_disable_bit = CNP_PMC_READ_DISABLE_BIT,
> + .lpm_num_maps = NVL_LPM_NUM_MAPS,
> + .ltr_ignore_max = LNL_NUM_IP_IGN_ALLOWED,
> + .lpm_res_counter_step_x2 = TGL_PMC_LPM_RES_COUNTER_STEP_X2,
> + .etr3_offset = ETR3_OFFSET,
> + .lpm_sts_latch_en_offset = MTL_LPM_STATUS_LATCH_EN_OFFSET,
> + .lpm_priority_offset = NVL_LPM_PRI_OFFSET,
> + .lpm_en_offset = NVL_LPM_EN_OFFSET,
> + .lpm_residency_offset = NVL_LPM_RESIDENCY_OFFSET,
> + .lpm_sts = nvl_pcdh_lpm_maps,
> + .lpm_status_offset = MTL_LPM_STATUS_OFFSET,
> + .lpm_live_status_offset = NVL_LPM_LIVE_STATUS_OFFSET,
> + .s0ix_blocker_maps = nvl_pcdh_blk_maps,
> + .s0ix_blocker_offset = LNL_S0IX_BLOCKER_OFFSET,
> + .num_s0ix_blocker = NVL_PCDH_NUM_S0IX_BLOCKER,
> + .blocker_req_offset = NVL_PCDH_BLK_REQ_OFFSET,
> + .lpm_req_guid = PCDH_LPM_REQ_GUID,
> +};
> +
> +static const struct pmc_reg_map nvl_pcds_reg_map = {
> + .pfear_sts = ext_nvl_pcds_pfear_map,
> + .slp_s0_offset = CNP_PMC_SLP_S0_RES_COUNTER_OFFSET,
> + .slp_s0_res_counter_step = TGL_PMC_SLP_S0_RES_COUNTER_STEP,
> + .ltr_show_sts = nvl_pcds_ltr_show_map,
> + .msr_sts = msr_map,
> + .ltr_ignore_offset = CNP_PMC_LTR_IGNORE_OFFSET,
> + .regmap_length = NVL_PCDS_PMC_MMIO_REG_LEN,
> + .ppfear0_offset = CNP_PMC_HOST_PPFEAR0A,
> + .ppfear_buckets = LNL_PPFEAR_NUM_ENTRIES,
> + .pm_cfg_offset = CNP_PMC_PM_CFG_OFFSET,
> + .pm_read_disable_bit = CNP_PMC_READ_DISABLE_BIT,
> + .lpm_num_maps = PTL_LPM_NUM_MAPS,
> + .ltr_ignore_max = LNL_NUM_IP_IGN_ALLOWED,
> + .lpm_res_counter_step_x2 = TGL_PMC_LPM_RES_COUNTER_STEP_X2,
> + .etr3_offset = ETR3_OFFSET,
> + .lpm_sts_latch_en_offset = MTL_LPM_STATUS_LATCH_EN_OFFSET,
> + .lpm_priority_offset = MTL_LPM_PRI_OFFSET,
> + .lpm_en_offset = MTL_LPM_EN_OFFSET,
> + .lpm_residency_offset = MTL_LPM_RESIDENCY_OFFSET,
> + .lpm_sts = nvl_pcds_lpm_maps,
> + .lpm_status_offset = MTL_LPM_STATUS_OFFSET,
> + .lpm_live_status_offset = MTL_LPM_LIVE_STATUS_OFFSET,
> + .s0ix_blocker_maps = nvl_pcds_blk_maps,
> + .s0ix_blocker_offset = LNL_S0IX_BLOCKER_OFFSET,
> + .num_s0ix_blocker = NVL_PCDS_NUM_S0IX_BLOCKER,
> + .lpm_req_guid = PCDS_LPM_REQ_GUID,
> + .blocker_req_offset = NVL_PCDS_BLK_REQ_OFFSET,
> +};
> +
> +static const struct pmc_reg_map nvl_pchs_reg_map = {
> + .pfear_sts = ext_nvl_pchs_pfear_map,
> + .slp_s0_offset = CNP_PMC_SLP_S0_RES_COUNTER_OFFSET,
> + .slp_s0_res_counter_step = TGL_PMC_SLP_S0_RES_COUNTER_STEP,
> + .ltr_show_sts = ptl_pcdp_ltr_show_map,
> + .msr_sts = msr_map,
> + .ltr_ignore_offset = CNP_PMC_LTR_IGNORE_OFFSET,
> + .regmap_length = NVL_PCHS_PMC_MMIO_REG_LEN,
> + .ppfear0_offset = CNP_PMC_HOST_PPFEAR0A,
> + .ppfear_buckets = LNL_PPFEAR_NUM_ENTRIES,
> + .pm_cfg_offset = CNP_PMC_PM_CFG_OFFSET,
> + .pm_read_disable_bit = CNP_PMC_READ_DISABLE_BIT,
> + .lpm_num_maps = PTL_LPM_NUM_MAPS,
> + .ltr_ignore_max = LNL_NUM_IP_IGN_ALLOWED,
> + .lpm_res_counter_step_x2 = TGL_PMC_LPM_RES_COUNTER_STEP_X2,
> + .etr3_offset = ETR3_OFFSET,
> + .lpm_sts_latch_en_offset = MTL_LPM_STATUS_LATCH_EN_OFFSET,
> + .lpm_priority_offset = MTL_LPM_PRI_OFFSET,
> + .lpm_en_offset = MTL_LPM_EN_OFFSET,
> + .lpm_residency_offset = MTL_LPM_RESIDENCY_OFFSET,
> + .lpm_sts = nvl_pchs_lpm_maps,
> + .lpm_status_offset = MTL_LPM_STATUS_OFFSET,
> + .lpm_live_status_offset = MTL_LPM_LIVE_STATUS_OFFSET,
> + .s0ix_blocker_maps = nvl_pchs_blk_maps,
> + .s0ix_blocker_offset = LNL_S0IX_BLOCKER_OFFSET,
> + .num_s0ix_blocker = NVL_PCHS_NUM_S0IX_BLOCKER,
> + .blocker_req_offset = NVL_PCHS_BLK_REQ_OFFSET,
> + .lpm_req_guid = PCHS_LPM_REQ_GUID,
> +};
> +
> +static struct pmc_info nvl_pmc_info_list[] = {
> + {
> + .devid = PMC_DEVID_NVL_PCDH,
> + .map = &nvl_pcdh_reg_map,
> + },
> + {
> + .devid = PMC_DEVID_NVL_PCDS,
> + .map = &nvl_pcds_reg_map,
> + },
> + {
> + .devid = PMC_DEVID_NVL_PCHS,
> + .map = &nvl_pchs_reg_map,
> + },
> + {}
> +};
> +
> +const char *nvl_ltr_block_counter_arr[] = {
> + "PKGC_PREVENT_LTR_IADOMAIN",
> + "PKGC_PREVENT_LTR_GDIE",
> + "PKGC_PREVENT_LTR_PCH",
> + "PKGC_PREVENT_LTR_DISPLAY",
> + "PKGC_PREVENT_LTR_IPU",
> + NULL
> +};
> +
> +const char *nvl_pkgc_blocker_residency[] = {
> + "PKGC_BLOCK_RESIDENCY_INVALID",
> + "PKGC_BLOCK_RESIDENCY_MISC",
> + "PKGC_BLOCK_RESIDENCY_CDIE_MISC",
> + "PKGC_BLOCK_RESIDENCY_MEDIA_MISC",
> + "PKGC_BLOCK_RESIDENCY_GT_MISC",
> + "PKGC_BLOCK_RESIDENCY_HUBATOM_MISC",
> + "PKGC_BLOCK_RESIDENCY_IPU_BUSY",
> + "PKGC_BLOCK_RESIDENCY_IPU_LTR",
> + "PKGC_BLOCK_RESIDENCY_IPU_TIMER",
> + "PKGC_BLOCK_RESIDENCY_DISP_BUSY",
> + "PKGC_BLOCK_RESIDENCY_DISP_LTR",
> + "PKGC_BLOCK_RESIDENCY_DISP_TIMER",
> + "PKGC_BLOCK_RESIDENCY_VPU_BUSY",
> + "PKGC_BLOCK_RESIDENCY_VPU_TIMER",
> + "PKGC_BLOCK_RESIDENCY_PMC_BUSY",
> + "PKGC_BLOCK_RESIDENCY_PMC_LTR",
> + "PKGC_BLOCK_RESIDENCY_PMC_TIMER",
> + "PKGC_BLOCK_RESIDENCY_HUBATOM_ARAT",
> + "PKGC_BLOCK_RESIDENCY_CDIE0_ARAT",
> + "PKGC_BLOCK_RESIDENCY_CDIE1_ARAT",
> + "PKGC_BLOCK_RESIDENCY_GT_ARAT",
> + "PKGC_BLOCK_RESIDENCY_MEDIA_ARAT",
> + "PKGC_BLOCK_RESIDENCY_DEMOTION",
> + "PKGC_BLOCK_RESIDENCY_THERMALS",
> + "PKGC_BLOCK_RESIDENCY_SNCU",
> + "PKGC_BLOCK_RESIDENCY_SVTU",
> + "PKGC_BLOCK_RESIDENCY_IAA",
> + "PKGC_BLOCK_RESIDENCY_IOC",
> + NULL,
> +};
> +
> +static u8 nvl_pmc_list[] = {PMC_IDX_MAIN, PMC_IDX_PCH};
> +static u8 nvl_h_pmc_list[] = {PMC_IDX_MAIN, PMC_IDX_PCH};
These are identical so why both are needed?
These arrays could generally be const, no? (Applies to the other patch as
well.)
> +
> +#define NVL_NPU_PCI_DEV 0xd71d
> +
> +/*
> + * Set power state of select devices that do not have drivers to D3
> + * so that they do not block Package C entry.
> + */
> +static void nvl_d3_fixup(void)
> +{
> + pmc_core_set_device_d3(NVL_NPU_PCI_DEV);
> +}
> +
> +static int nvl_resume(struct pmc_dev *pmcdev)
> +{
> + nvl_d3_fixup();
> + return cnl_resume(pmcdev);
> +}
> +
> +static int nvl_core_init(struct pmc_dev *pmcdev, struct pmc_dev_info *pmc_dev_info)
> +{
> + nvl_d3_fixup();
> + return generic_core_init(pmcdev, pmc_dev_info);
> +}
> +
> +static u32 nvl_pmt_dmu_guids[] = {NVL_PMT_DMU_GUID, 0x0};
> +struct pmc_dev_info nvl_s_pmc_dev = {
> + .num_pmcs = ARRAY_SIZE(nvl_pmc_list),
> + .pmc_list = nvl_pmc_list,
> + .regmap_list = nvl_pmc_info_list,
> + .map = &nvl_pcds_reg_map,
> + .sub_req_show = &pmc_core_substate_blk_req_fops,
> + .suspend = cnl_suspend,
> + .resume = nvl_resume,
> + .init = nvl_core_init,
> + .sub_req = pmc_core_pmt_get_blk_sub_req,
> + .dmu_guids = nvl_pmt_dmu_guids,
> + .pc_guid = NVL_PMT_PC_GUID,
> + .pkgc_ltr_blocker_offset = NVL_LTR_BLK_OFFSET,
> + .pkgc_ltr_blocker_counters = nvl_ltr_block_counter_arr,
> + .pkgc_blocker_offset = NVL_PKGC_BLK_OFFSET,
> + .pkgc_blocker_counters = nvl_pkgc_blocker_residency,
> + .ssram_hidden = false,
> + .die_c6_offset = NVL_PMT_DMU_DIE_C6_OFFSET,
> +};
> +
> +struct pmc_dev_info nvl_h_pmc_dev = {
> + .num_pmcs = ARRAY_SIZE(nvl_h_pmc_list),
> + .pmc_list = nvl_h_pmc_list,
> + .regmap_list = nvl_pmc_info_list,
> + .map = &nvl_pcdh_reg_map,
> + .sub_req_show = &pmc_core_substate_blk_req_fops,
> + .suspend = cnl_suspend,
> + .resume = nvl_resume,
> + .init = nvl_core_init,
> + .sub_req = pmc_core_pmt_get_blk_sub_req,
> + .dmu_guids = nvl_pmt_dmu_guids,
> + .pc_guid = NVL_PMT_PC_GUID,
> + .pkgc_ltr_blocker_offset = NVL_LTR_BLK_OFFSET,
> + .pkgc_ltr_blocker_counters = nvl_ltr_block_counter_arr,
> + .pkgc_blocker_offset = NVL_PKGC_BLK_OFFSET,
> + .pkgc_blocker_counters = nvl_pkgc_blocker_residency,
> + .ssram_hidden = false,
> + .die_c6_offset = NVL_PMT_DMU_DIE_C6_OFFSET,
> +};
> diff --git a/drivers/platform/x86/intel/pmc/ptl.c b/drivers/platform/x86/intel/pmc/ptl.c
> index 7aa39db256770..3e1cf6905e111 100644
> --- a/drivers/platform/x86/intel/pmc/ptl.c
> +++ b/drivers/platform/x86/intel/pmc/ptl.c
> @@ -137,7 +137,7 @@ static const struct pmc_bit_map *ext_ptl_pcdp_pfear_map[] = {
> NULL
> };
>
> -static const struct pmc_bit_map ptl_pcdp_ltr_show_map[] = {
> +const struct pmc_bit_map ptl_pcdp_ltr_show_map[] = {
> {"SOUTHPORT_A", CNP_PMC_LTR_SPA},
> {"SOUTHPORT_B", CNP_PMC_LTR_SPB},
> {"SATA", CNP_PMC_LTR_SATA},
>
--
i.
^ permalink raw reply
* [GIT PULL] Thermal control updates for v7.1-rc1
From: Rafael J. Wysocki @ 2026-04-10 14:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux PM, Linux Kernel Mailing List, Daniel Lezcano
Hi Linus,
This goes early because I will be traveling next week.
Please pull from the tag
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
thermal-7.1-rc1
with top-most commit cd1a3b2ff0553e987de71ff0aa675e418de22898
Merge tag 'thermal-v7.1-rc1' of
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux
on top of commit 9e07e3b81807edd356e1f794cffa00a428eff443
thermal: core: Fix thermal zone device registration error path
to receive thermal control updates for 7.1-rc1.
These include thermal core fixes and simplifications, driver fixes and
new hardware support (SDM670, Eliza SoC), new driver features (hwmon
support in imx91, DDR data rate on Nova Lake in int340x), and a handful
of cleanups:
- Fix thermal core issues related to thermal zone removal and
registration errors that may lead to a use-after-free or a memory
leak in some cases (Rafael Wysocki)
- Drop a redundant check from thermal_zone_device_update(), adjust
thermal workqueue allocation flags, and switch over thermal_class
allocation to static (Rafael Wysocki)
- Relocate the suspend and resume of thermal zones closer to the
suspend and resume of devices, respectively (Rafael Wysocki)
- Remove a pointless variable used in the thermal core when
registering a cooling device (Daniel Lezcano)
- Replace sprintf() in thermal_bind_cdev_to_trip() and use
str_enabled_disabled() helper in mode_show() (Thorsten Blum)
- Replace cpumask_weight() in intel_hfi_offline() with cpumask_empty()
which is generally more efficient (Yury Norov)
- Add support for reading DDR data rate from PCI config space on Nova
Lake platforms to the int340x thermal driver (Srinivas Pandruvada)
- Add an OF node address to output message to make sensor names more
distinguishable (Alexander Stein)
- Add hwmon support for the i.MX97 thermal sensor (Alexander Stein)
- Clamp correctly the results when doing value/temperature conversion
in the Spreadtrum driver (Thorsten Blum)
- Add SDM670 compatible DT bindings for the Tsens and the lMH thermal
drivers (Richard Acayan)
- Add SM8750 compatible DT bindings for the Tsens thermal driver (Manaf
Meethalavalappu Pallikunhi)
- Add Eliza SoC compatible DT bindings for the Tsens driver (Krzysztof
Kozlowski)
- Fix inverted condition check on error in the Spear thermal control
driver (Gopi Krishna Menon)
- Convert DT bindings documentation into DT schema (Gopi Krishna Menon)
- Use max() macro to increase readability in the Broadcom STB thermal
sensor (Thorsten Blum)
- Remove a stale @trim_offset kernel-doc entry (John Madieu)
Thanks!
---------------
Alexander Stein (2):
thermal/of: Add OF node address to output message
thermal/drivers/imx91: Add hwmon support
Anas Iqbal (1):
thermal: devfreq_cooling: avoid unnecessary kfree of freq_table
Daniel Lezcano (1):
thermal/core: Remove pointless variable when registering a cooling device
Gopi Krishna Menon (2):
thermal/drivers/spear: Fix error condition for reading st,thermal-flags
dt-bindings: thermal: st,thermal-spear1340: convert to dtschema
John Madieu (1):
thermal: renesas: rzg3e: Remove stale @trim_offset kernel-doc entry
Krzysztof Kozlowski (1):
dt-bindings: thermal: qcom-tsens: Add Eliza SoC TSENS
Manaf Meethalavalappu Pallikunhi (1):
dt-bindings: thermal: qcom-tsens: Document the SM8750 Temperature Sensor
Rafael J. Wysocki (6):
thermal: core: Fix thermal zone governor cleanup issues
thermal: core: Free thermal zone ID later during removal
thermal: core: Drop redundant check from thermal_zone_device_update()
thermal: core: Adjust thermal_wq allocation flags
thermal: core: Allocate thermal_class statically
thermal: core: Suspend thermal zones later and resume them earlier
Richard Acayan (2):
dt-bindings: thermal: tsens: add SDM670 compatible
dt-bindings: thermal: lmh: Add SDM670 compatible
Srinivas Pandruvada (1):
thermal: intel: int340x: Read DDR data rate for Nova Lake
Thorsten Blum (6):
thermal: core: Replace sprintf() in thermal_bind_cdev_to_trip()
thermal/drivers/sprd: Fix temperature clamping in sprd_thm_temp_to_rawdata
thermal/drivers/sprd: Fix raw temperature clamping in
sprd_thm_rawdata_to_temp
thermal/drivers/sprd: Use min instead of clamp in sprd_thm_temp_to_rawdata
thermal: sysfs: Use str_enabled_disabled() helper in mode_show()
thermal/drivers/brcmstb_thermal: Use max to simplify brcmstb_get_temp
Yury Norov (1):
thermal: intel: hfi: use cpumask_empty() in intel_hfi_offline()
---------------
.../devicetree/bindings/thermal/qcom-lmh.yaml | 3 +
.../devicetree/bindings/thermal/qcom-tsens.yaml | 3 +
.../devicetree/bindings/thermal/spear-thermal.txt | 14 ---
.../bindings/thermal/st,thermal-spear1340.yaml | 36 ++++++
drivers/base/power/main.c | 5 +
drivers/thermal/broadcom/brcmstb_thermal.c | 8 +-
drivers/thermal/devfreq_cooling.c | 3 +-
drivers/thermal/imx91_thermal.c | 4 +
.../intel/int340x_thermal/processor_thermal_rfim.c | 25 ++++-
drivers/thermal/intel/intel_hfi.c | 2 +-
drivers/thermal/renesas/rzg3e_thermal.c | 1 -
drivers/thermal/spear_thermal.c | 2 +-
drivers/thermal/sprd_thermal.c | 6 +-
drivers/thermal/thermal_core.c | 121 ++++++++-------------
drivers/thermal/thermal_of.c | 20 ++--
drivers/thermal/thermal_sysfs.c | 7 +-
include/linux/thermal.h | 6 +
17 files changed, 143 insertions(+), 123 deletions(-)
^ permalink raw reply
* Re: [PATCH v2 2/7] platform/x86/intel/pmc: Enable PkgC LTR blocking counter
From: Ilpo Järvinen @ 2026-04-10 14:28 UTC (permalink / raw)
To: Xi Pardee
Cc: irenic.rajneesh, david.e.box, platform-driver-x86, LKML, linux-pm
In-Reply-To: <20260408222144.3288928-3-xi.pardee@linux.intel.com>
On Wed, 8 Apr 2026, Xi Pardee wrote:
> Enable the Package C-state LTR blocking counter in the PMT telemetry
> region. This counter records how many times any Package C-state entry
> is blocked for the specified reasons.
>
> Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
> ---
> drivers/platform/x86/intel/pmc/core.c | 74 +++++++++++++++++++++++----
> drivers/platform/x86/intel/pmc/core.h | 15 +++++-
> 2 files changed, 77 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
> index c8a92d6235203..5c519942ec58c 100644
> --- a/drivers/platform/x86/intel/pmc/core.c
> +++ b/drivers/platform/x86/intel/pmc/core.c
> @@ -1071,6 +1071,29 @@ static int pmc_core_die_c6_us_show(struct seq_file *s, void *unused)
> }
> DEFINE_SHOW_ATTRIBUTE(pmc_core_die_c6_us);
>
> +static int pmc_core_pkgc_ltr_blocker_show(struct seq_file *s, void *unused)
> +{
> + struct pmc_dev *pmcdev = s->private;
> + const char **pkgc_ltr_blocker_counters;
> + unsigned int i;
> + u32 counter;
> + int ret;
> +
> + pkgc_ltr_blocker_counters = pmcdev->pkgc_ltr_blocker_counters;
> + for (i = 0; pkgc_ltr_blocker_counters[i]; i++) {
> + ret = pmt_telem_read32(pmcdev->pc_ep,
> + pmcdev->pkgc_ltr_blocker_offset + i,
> + &counter, 1);
> +
> + if (ret)
Don't leve empty lines between call and its error handling.
> + return ret;
Maybe put the empty line here instead?
> + seq_printf(s, "%-30s %-30u\n", pkgc_ltr_blocker_counters[i], counter);
> + }
> +
> + return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(pmc_core_pkgc_ltr_blocker);
--
i.
^ permalink raw reply
* Re: [PATCH v2 3/7] platform/x86/intel/pmc: Enable Pkgc blocking residency counter
From: Ilpo Järvinen @ 2026-04-10 14:27 UTC (permalink / raw)
To: Xi Pardee
Cc: irenic.rajneesh, david.e.box, platform-driver-x86, LKML, linux-pm
In-Reply-To: <20260408222144.3288928-4-xi.pardee@linux.intel.com>
[-- Attachment #1: Type: text/plain, Size: 6210 bytes --]
On Wed, 8 Apr 2026, Xi Pardee wrote:
> Enable the Package C-state blocking counter in the PMT telemetry
> region. This counter reports the number of 10 µs intervals during
> which a Package C-state 10.2/3 entry was blocked for the specified
> reasons.
>
> Create a common helper for pmc_core_pkgc_ltr_blocker_show() and
> pmc_core_pkgc_blocker_residency_show() as these two functions
> share similar logic.
Please don't do back and forth changes like this within a series. You
should add it in the right form from the beginning.
--
i.
> Signed-off-by: Xi Pardee <xi.pardee@linux.intel.com>
> ---
> drivers/platform/x86/intel/pmc/core.c | 40 ++++++++++++++++++++-------
> drivers/platform/x86/intel/pmc/core.h | 8 ++++++
> 2 files changed, 38 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
> index 5c519942ec58c..94ae098a155a6 100644
> --- a/drivers/platform/x86/intel/pmc/core.c
> +++ b/drivers/platform/x86/intel/pmc/core.c
> @@ -1071,29 +1071,44 @@ static int pmc_core_die_c6_us_show(struct seq_file *s, void *unused)
> }
> DEFINE_SHOW_ATTRIBUTE(pmc_core_die_c6_us);
>
> -static int pmc_core_pkgc_ltr_blocker_show(struct seq_file *s, void *unused)
> +static int pmc_core_pkgc_counters_show(struct seq_file *s,
> + struct telem_endpoint *ep,
> + u32 offset, const char **counters)
> {
> - struct pmc_dev *pmcdev = s->private;
> - const char **pkgc_ltr_blocker_counters;
> unsigned int i;
> u32 counter;
> int ret;
>
> - pkgc_ltr_blocker_counters = pmcdev->pkgc_ltr_blocker_counters;
> - for (i = 0; pkgc_ltr_blocker_counters[i]; i++) {
> - ret = pmt_telem_read32(pmcdev->pc_ep,
> - pmcdev->pkgc_ltr_blocker_offset + i,
> - &counter, 1);
> -
> + for (i = 0; counters[i]; i++) {
> + ret = pmt_telem_read32(ep, offset + i, &counter, 1);
> if (ret)
> return ret;
> - seq_printf(s, "%-30s %-30u\n", pkgc_ltr_blocker_counters[i], counter);
> + seq_printf(s, "%-30s %-30u\n", counters[i], counter);
> }
>
> return 0;
> }
> +
> +static int pmc_core_pkgc_ltr_blocker_show(struct seq_file *s, void *unused)
> +{
> + struct pmc_dev *pmcdev = s->private;
> +
> + return pmc_core_pkgc_counters_show(s, pmcdev->pc_ep,
> + pmcdev->pkgc_ltr_blocker_offset,
> + pmcdev->pkgc_ltr_blocker_counters);
> +}
> DEFINE_SHOW_ATTRIBUTE(pmc_core_pkgc_ltr_blocker);
>
> +static int pmc_core_pkgc_blocker_residency_show(struct seq_file *s, void *unused)
> +{
> + struct pmc_dev *pmcdev = s->private;
> +
> + return pmc_core_pkgc_counters_show(s, pmcdev->pc_ep,
> + pmcdev->pkgc_blocker_offset,
> + pmcdev->pkgc_blocker_counters);
> +}
> +DEFINE_SHOW_ATTRIBUTE(pmc_core_pkgc_blocker_residency);
> +
> static int pmc_core_lpm_latch_mode_show(struct seq_file *s, void *unused)
> {
> struct pmc_dev *pmcdev = s->private;
> @@ -1381,6 +1396,8 @@ void pmc_core_punit_pmt_init(struct pmc_dev *pmcdev, struct pmc_dev_info *pmc_de
> pmcdev->pc_ep = ep;
> pmcdev->pkgc_ltr_blocker_counters = pmc_dev_info->pkgc_ltr_blocker_counters;
> pmcdev->pkgc_ltr_blocker_offset = pmc_dev_info->pkgc_ltr_blocker_offset;
> + pmcdev->pkgc_blocker_counters = pmc_dev_info->pkgc_blocker_counters;
> + pmcdev->pkgc_blocker_offset = pmc_dev_info->pkgc_blocker_offset;
> }
> }
>
> @@ -1510,6 +1527,9 @@ static void pmc_core_dbgfs_register(struct pmc_dev *pmcdev, struct pmc_dev_info
> debugfs_create_file("pkgc_ltr_blocker_show", 0444,
> pmcdev->dbgfs_dir, pmcdev,
> &pmc_core_pkgc_ltr_blocker_fops);
> + debugfs_create_file("pkgc_blocker_residency_show", 0444,
> + pmcdev->dbgfs_dir, pmcdev,
> + &pmc_core_pkgc_blocker_residency_fops);
> }
>
> }
> diff --git a/drivers/platform/x86/intel/pmc/core.h b/drivers/platform/x86/intel/pmc/core.h
> index a20aab73c1409..829b1dee3f636 100644
> --- a/drivers/platform/x86/intel/pmc/core.h
> +++ b/drivers/platform/x86/intel/pmc/core.h
> @@ -455,6 +455,8 @@ struct pmc {
> *
> * @pkgc_ltr_blocker_counters: Array of PKGC LTR blocker counters
> * @pkgc_ltr_blocker_offset: Offset to PKGC LTR blockers in telemetry region
> + * @pkgc_blocker_counters: Array of PKGC blocker counters
> + * @pkgc_blocker_offset: Offset to PKGC blocker in telemetry region
> *
> * pmc_dev contains info about power management controller device.
> */
> @@ -480,6 +482,8 @@ struct pmc_dev {
>
> const char **pkgc_ltr_blocker_counters;
> u32 pkgc_ltr_blocker_offset;
> + const char **pkgc_blocker_counters;
> + u32 pkgc_blocker_offset;
> };
>
> enum pmc_index {
> @@ -495,6 +499,7 @@ enum pmc_index {
> * @dmu_guids: List of Die Management Unit GUID
> * @pc_guid: GUID for telemetry region to read PKGC blocker info
> * @pkgc_ltr_blocker_offset: Offset to PKGC LTR blockers in telemetry region
> + * @pkgc_blocker_offset:Offset to PKGC blocker in telemetry region
> * @regmap_list: Pointer to a list of pmc_info structure that could be
> * available for the platform. When set, this field implies
> * SSRAM support.
> @@ -502,6 +507,7 @@ enum pmc_index {
> * specific attributes of the primary PMC
> * @sub_req_show: File operations to show substate requirements
> * @pkgc_ltr_blocker_counters: Array of PKGC LTR blocker counters
> + * @pkgc_blocker_counters: Array of PKGC blocker counters
> * @suspend: Function to perform platform specific suspend
> * @resume: Function to perform platform specific resume
> * @init: Function to perform platform specific init action
> @@ -512,10 +518,12 @@ struct pmc_dev_info {
> u32 *dmu_guids;
> u32 pc_guid;
> u32 pkgc_ltr_blocker_offset;
> + u32 pkgc_blocker_offset;
> struct pmc_info *regmap_list;
> const struct pmc_reg_map *map;
> const struct file_operations *sub_req_show;
> const char **pkgc_ltr_blocker_counters;
> + const char **pkgc_blocker_counters;
> void (*suspend)(struct pmc_dev *pmcdev);
> int (*resume)(struct pmc_dev *pmcdev);
> int (*init)(struct pmc_dev *pmcdev, struct pmc_dev_info *pmc_dev_info);
>
^ permalink raw reply
* [GIT PULL] ACPI support updates for v7.1-rc1
From: Rafael J. Wysocki @ 2026-04-10 14:27 UTC (permalink / raw)
To: Linus Torvalds
Cc: ACPI Devel Maling List, Linux PM, Linux Kernel Mailing List
Hi Linus,
This goes early because I will be traveling next week.
Please pull from the tag
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
acpi-7.1-rc1
with top-most commit 8e937866b425248fa375b2138c19c117a87c6be0
Merge branch 'acpi-apei'
on top of commit 591cd656a1bf5ea94a222af5ef2ee76df029c1d2
Linux 7.0-rc7
to receive ACPI support updates for 7.1-rc1.
These include an update of the CMOS RTC driver and the related ACPI
and x86 code that, among other things, switches it over to using the
platform device interface for device binding on x86 instead of the PNP
device driver interface (which allows the code in question to be
simplified quite a bit), a major update of the ACPI Time and Alarm
Device (TAD) driver adding an RTC class device interface to it, and
updates of core ACPI drivers that remove some unnecessary and not
really useful code from them.
Apart from that, two drivers are converted to using the platform driver
interface for device binding instead of the ACPI driver one, which is
slated for removal, support for the Performance Limited register is
added to the ACPI CPPC library and there are some janitorial updates
of it and the related cpufreq CPPC driver, the ACPI processor driver is
fixed and cleaned up, and NVIDIA vendor CPER record handler is added
to the APEI GHES code.
Also, the interface for obtaining a CPU UID from ACPI is consolidated
across architectures and used for fixing a problem with the PCI TPH
Steering Tag on ARM64, there are two updates related to ACPICA, a
minor ACPI OS Services Layer (OSL) update, and a few assorted updates
related to ACPI tables parsing.
Specifics:
- Update maintainers information regarding ACPICA (Rafael Wysocki)
- Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy() (Kees
Cook)
- Trigger an ordered system power off after encountering a fatal error
operator in AML (Armin Wolf)
- Enable ACPI FPDT parsing on LoongArch (Xi Ruoyao)
- Remove the temporary stop-gap acpi_pptt_cache_v1_full structure from
the ACPI PPTT parser (Ben Horgan)
- Add support for exposing ACPI FPDT subtables FBPT and S3PT (Nate
DeSimone)
- Address multiple assorted issues and clean up the code in the ACPI
processor idle driver (Huisong Li)
- Replace strlcat() in the ACPI processor idle drive with a better
alternative (Andy Shevchenko)
- Rearrange and clean up acpi_processor_errata_piix4() (Rafael Wysocki)
- Move reference performance to capabilities and fix an uninitialized
variable in the ACPI CPPC library (Pengjie Zhang)
- Add support for the Performance Limited Register to the ACPI CPPC
library (Sumit Gupta)
- Add cppc_get_perf() API to read performance controls, extend
cppc_set_epp_perf() for FFH/SystemMemory, and make the ACPI CPPC
library warn on missing mandatory DESIRED_PERF register (Sumit Gupta)
- Modify the cpufreq CPPC driver to update MIN_PERF/MAX_PERF in target
callbacks to allow it to control performance bounds via standard
scaling_min_freq and scaling_max_freq sysfs attributes and add sysfs
documentation for the Performance Limited Register to it (Sumit Gupta)
- Add ACPI support to the platform device interface in the CMOS RTC
driver, make the ACPI core device enumeration code create a platform
device for the CMOS RTC, and drop CMOS RTC PNP device support (Rafael
Wysocki)
- Consolidate the x86-specific CMOS RTC handling with the ACPI TAD
driver and clean up the CMOS RTC ACPI address space handler (Rafael
Wysocki)
- Enable ACPI alarm in the CMOS RTC driver if advertised in ACPI FADT
and allow that driver to work without a dedicated IRQ if the ACPI
alarm is used (Rafael Wysocki)
- Clean up the ACPI TAD driver in various ways and add an RTC class
device interface, including both the RTC setting/reading and alarm
timer support, to it (Rafael Wysocki)
- Clean up the ACPI AC and ACPI PAD (processor aggregator device)
drivers (Rafael Wysocki)
- Rework checking for duplicate video bus devices and consolidate
pnp.bus_id workarounds handling in the ACPI video bus driver (Rafael
Wysocki)
- Update the ACPI core device drivers to stop setting acpi_device_name()
unnecessarily (Rafael Wysocki)
- Rearrange code using acpi_device_class() in the ACPI core device
drivers and update them to stop setting acpi_device_class()
unnecessarily (Rafael Wysocki)
- Define ACPI_AC_CLASS in one place (Rafael Wysocki)
- Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver to
bind to platform devices instead of ACPI devices (Rafael Wysocki)
- Add devm_ghes_register_vendor_record_notifier(), use it in the PCI
hisi driver, and Add NVIDIA vendor CPER record handler (Kai-Heng
Feng)
- Consolidate the interface for obtaining a CPU UID from ACPI across
architectures and use it to address incorrect PCI TPH Steering Tag
on ARM64 resulting from the invalid assumption that the ACPI
Processor UID would always be the same as the corresponding logical
CPU ID in Linux (Chengwen Feng)
Thanks!
---------------
Andy Shevchenko (1):
ACPI: processor: idle: Replace strlcat() with better alternative
Armin Wolf (1):
ACPI: OSL: Poweroff when encountering a fatal ACPI error
Ben Horgan (1):
ACPI: PPTT: Remove duplicate structure, acpi_pptt_cache_v1_full
Chengwen Feng (8):
arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
LoongArch: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
RISC-V: ACPI: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
x86/acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
ACPI: Centralize acpi_get_cpu_uid() declaration in include/linux/acpi.h
perf: arm_cspmu: Switch to acpi_get_cpu_uid() from get_acpi_id_for_cpu()
ACPI: PPTT: Use acpi_get_cpu_uid() and remove get_acpi_id_for_cpu()
PCI/TPH: Pass ACPI Processor UID to Cache Locality _DSM
Huisong Li (7):
ACPI: processor: idle: Remove redundant cstate check in
acpi_processor_power_init
ACPI: processor: idle: Move max_cstate update out of the loop
ACPI: processor: idle: Remove redundant static variable and
rename cstate check function
ACPI: processor: idle: Reset power_setup_done flag on
initialization failure
ACPI: processor: idle: Fix NULL pointer dereference in hotplug path
cpuidle: Extract and export no-lock variants of
cpuidle_unregister_device()
ACPI: processor: idle: Reset cpuidle on C-state list changes
Jingkai Tan (1):
ACPI: processor: idle: Add missing bounds check in flatten_lpi_states()
Kai-Heng Feng (3):
ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
Kees Cook (1):
ACPICA: Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy()
Nate DeSimone (2):
ACPI: FPDT: expose FBPT and S3PT subtables via sysfs
Documentation: ABI: add FBPT and S3PT entries to sysfs-firmware-acpi
Pengjie Zhang (2):
ACPI: CPPC: Move reference performance to capabilities
ACPI: CPPC: Fix uninitialized ref variable in cppc_get_perf_caps()
Rafael J. Wysocki (37):
ACPI: x86: cmos_rtc: Clean up address space handler driver
ACPI: x86: cmos_rtc: Improve coordination with ACPI TAD driver
ACPI: x86: cmos_rtc: Create a CMOS RTC platform device
ACPI: x86/rtc-cmos: Use platform device for driver binding
ACPI: PNP: Drop CMOS RTC PNP device support
x86: rtc: Drop PNP device check
rtc: cmos: Drop PNP device support
ACPI: TAD/x86: cmos_rtc: Consolidate address space handler setup
ACPI: AC: Get rid of unnecessary declarations
ACPI: PAD: Rearrange notify handler installation and removal
driver core: auxiliary bus: Introduce dev_is_auxiliary()
ACPI: video: Rework checking for duplicate video bus devices
ACPI: video: Consolidate pnp.bus_id workarounds handling
ACPI: driver: Do not set acpi_device_name() unnecessarily
ACPI: event: Redefine acpi_notifier_call_chain()
ACPI: driver: Avoid using pnp.device_class for netlink handling
ACPI: driver: Do not set acpi_device_class() unnecessarily
ACPI: AC: Define ACPI_AC_CLASS in one place
rtc: cmos: Enable ACPI alarm if advertised in ACPI FADT
rtc: cmos: Do not require IRQ if ACPI alarm is used
ACPI: TAD: Create one attribute group
ACPI: TAD: Support RTC without wakeup
ACPI: TAD: Use __free() for cleanup in time_store()
ACPI: TAD: Rearrange RT data validation checking
ACPI: TAD: Clear unused RT data in acpi_tad_set_real_time()
ACPI: TAD: Add RTC class device interface
ACPI: TAD: Update the driver description comment
ACPI: TAD: Use dev_groups in struct device_driver
ACPI: TAD: Use DC wakeup only if AC wakeup is supported
ACPI: processor: Rearrange and clean up acpi_processor_errata_piix4()
ACPI: TAD: Split three functions to untangle runtime PM handling
ACPI: TAD: Relocate two functions
ACPI: TAD: Split acpi_tad_rtc_read_time()
ACPI: TAD: Add alarm support to the RTC class device interface
ACPI: PAD: xen: Convert to a platform driver
watchdog: ni903x_wdt: Convert to a platform driver
ACPICA: Update maintainers information
Sumit Gupta (8):
ACPI: CPPC: Add cppc_get_perf() API to read performance controls
ACPI: CPPC: Warn on missing mandatory DESIRED_PERF register
ACPI: CPPC: Extend cppc_set_epp_perf() for FFH/SystemMemory
cpufreq: CPPC: Update cached perf_ctrls on sysfs write
cpufreq: cppc: Update MIN_PERF/MAX_PERF in target callbacks
ACPI: CPPC: add APIs and sysfs interface for perf_limited
cpufreq: CPPC: Add sysfs documentation for perf_limited
ACPI: CPPC: Check cpc_read() return values consistently
Xi Ruoyao (1):
ACPI: tables: Enable FPDT on LoongArch
---------------
Documentation/ABI/testing/sysfs-devices-system-cpu | 18 +
Documentation/ABI/testing/sysfs-firmware-acpi | 6 +
Documentation/PCI/tph.rst | 4 +-
Documentation/admin-guide/kernel-parameters.txt | 8 +
MAINTAINERS | 8 +-
arch/arm64/include/asm/acpi.h | 17 +-
arch/arm64/kernel/acpi.c | 30 ++
arch/loongarch/include/asm/acpi.h | 5 -
arch/loongarch/kernel/acpi.c | 9 +
arch/riscv/include/asm/acpi.h | 4 -
arch/riscv/kernel/acpi.c | 16 +
arch/riscv/kernel/acpi_numa.c | 9 +-
arch/x86/include/asm/cpu.h | 1 -
arch/x86/include/asm/smp.h | 1 -
arch/x86/kernel/acpi/boot.c | 20 +
arch/x86/kernel/rtc.c | 21 +-
arch/x86/xen/enlighten_hvm.c | 5 +-
drivers/acpi/Kconfig | 2 +-
drivers/acpi/ac.c | 31 +-
drivers/acpi/acpi_fpdt.c | 28 ++
drivers/acpi/acpi_memhotplug.c | 4 -
drivers/acpi/acpi_pad.c | 28 +-
drivers/acpi/acpi_pnp.c | 22 +-
drivers/acpi/acpi_processor.c | 31 +-
drivers/acpi/acpi_tad.c | 492 ++++++++++++++-------
drivers/acpi/acpi_video.c | 100 ++---
drivers/acpi/acpica/utnonansi.c | 3 +-
drivers/acpi/apei/Kconfig | 14 +
drivers/acpi/apei/Makefile | 1 +
drivers/acpi/apei/ghes-nvidia.c | 149 +++++++
drivers/acpi/apei/ghes.c | 18 +
drivers/acpi/battery.c | 9 +-
drivers/acpi/button.c | 11 +-
drivers/acpi/cppc_acpi.c | 293 ++++++++++--
drivers/acpi/ec.c | 6 -
drivers/acpi/event.c | 7 +-
drivers/acpi/osl.c | 19 +-
drivers/acpi/pci_link.c | 4 -
drivers/acpi/pci_root.c | 9 +-
drivers/acpi/power.c | 4 -
drivers/acpi/pptt.c | 81 ++--
drivers/acpi/processor_driver.c | 22 +-
drivers/acpi/processor_idle.c | 82 ++--
drivers/acpi/riscv/rhct.c | 7 +-
drivers/acpi/sbs.c | 4 -
drivers/acpi/sbshc.c | 6 -
drivers/acpi/thermal.c | 13 +-
drivers/acpi/x86/cmos_rtc.c | 86 ++--
drivers/base/auxiliary.c | 10 +
drivers/cpufreq/cppc_cpufreq.c | 104 ++++-
drivers/cpuidle/cpuidle.c | 22 +-
drivers/gpu/drm/amd/include/amd_acpi.h | 2 -
drivers/gpu/drm/radeon/radeon_acpi.c | 2 -
drivers/pci/controller/pcie-hisi-error.c | 12 +-
drivers/pci/tph.c | 16 +-
drivers/perf/arm_cspmu/arm_cspmu.c | 6 +-
drivers/platform/x86/hp/hp-wmi.c | 2 -
drivers/platform/x86/lenovo/wmi-capdata.c | 1 -
drivers/rtc/rtc-cmos.c | 143 ++----
drivers/watchdog/ni903x_wdt.c | 27 +-
drivers/xen/xen-acpi-pad.c | 23 +-
include/acpi/acpi_bus.h | 14 +-
include/acpi/cppc_acpi.h | 22 +-
include/acpi/ghes.h | 11 +
include/acpi/processor.h | 2 -
include/linux/acpi.h | 21 +
include/linux/auxiliary_bus.h | 2 +
include/linux/cpuidle.h | 2 +
include/linux/pci-tph.h | 4 +-
69 files changed, 1450 insertions(+), 766 deletions(-)
^ permalink raw reply
* [GIT PULL] Power management updates for v7.1-rc1
From: Rafael J. Wysocki @ 2026-04-10 14:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux PM, Linux Kernel Mailing List, ACPI Devel Maling List,
Viresh Kumar, Shuah Khan, Chanwoo Choi (samsung.com),
Mario Limonciello
Hi Linus,
This goes early because I will be traveling next week.
Please pull from the tag
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
pm-7.1-rc1
with top-most commit d923f70e37310fe613883a3a4c2ea2f31246253b
Merge branch 'pm-devfreq'
on top of commit 591cd656a1bf5ea94a222af5ef2ee76df029c1d2
Linux 7.0-rc7
to receive power management updates for 7.1-rc1.
Once again, cpufreq is the most active development area, mostly because
of the new feature additions and documentation updates in the amd-pstate
driver, but there are also changes in the cpufreq core related to boost
support and other assorted updates elsewhere.
Next up are power capping changes due to the major cleanup of the Intel
RAPL driver.
On the cpuidle front, a new C-states table for Intel Panther Lake is
added to the intel_idle driver, the stopped tick handling in the menu
and teo governors is updated, and there are a couple of cleanups.
Apart from the above, support for Tegra114 is added to devfreq and
there are assorted cleanups of that code, there are also two updates
of the operating performance points (OPP) library, two minor updates
related to hibernation, and cpupower utility man pages updates and
cleanups.
Specifics:
- Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa)
- Update cpufreq-dt-platdev blocklist (Faruque Ansari)
- Minor updates to driver and dt-bindings for Tegra (Thierry Reding,
Rosen Penev)
- Add MAINTAINERS entry for CPPC driver (Viresh Kumar)
- Add support for new features: CPPC performance priority, Dynamic EPP,
Raw EPP, and new unit tests for them to amd-pstate (Gautham Shenoy,
Mario Limonciello)
- Fix sysfs files being present when HW missing and broken/outdated
documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy)
- Pass the policy to cpufreq_driver->adjust_perf() to avoid using
cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate which
leads to a scheduling-while-atomic bug (K Prateek Nayak)
- Clean up dead code in Kconfig for cpufreq (Julian Braha)
- Remove max_freq_req update for pre-existing cpufreq policy and add a
boost_freq_req QoS request to save the boost constraint instead of
overwriting the last scaling_max_freq constraint (Pierre Gondois)
- Embed cpufreq QoS freq_req objects in cpufreq policy so they all
are allocated in one go along with the policy to simplify lifetime
rules and avoid error handling issues (Viresh Kumar)
- Use DMI max speed when CPPC is unavailable in the acpi-cpufreq
scaling driver (Henry Tseng)
- Switch policy_is_shared() in cpufreq to using cpumask_nth() instead
of cpumask_weight() because the former is more efficient (Yury Norov)
- Use sysfs_emit() in sysfs show functions for cpufreq governor
attributes (Thorsten Blum)
- Update intel_pstate to stop returning an error when "off" is written
to its status sysfs attribute while the driver is already off (Fabio
De Francesco)
- Include current frequency in the debug message printed by
__cpufreq_driver_target() (Pengjie Zhang)
- Refine stopped tick handling in the menu cpuidle governor and
rearrange stopped tick handling in the teo cpuidle governor (Rafael
Wysocki)
- Add Panther Lake C-states table to the intel_idle driver (Artem
Bityutskiy)
- Clean up dead dependencies on CPU_IDLE in Kconfig (Julian Braha)
- Simplify cpuidle_register_device() with guard() (Huisong Li)
- Use performance level if available to distinguish between rates in
OPP debugfs (Manivannan Sadhasivam)
- Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar)
- Return -ENODATA if the snapshot image is not loaded (Alberto Garcia)
- Remove inclusion of crypto/hash.h from hibernate_64.c on x86 (Eric
Biggers)
- Clean up and rearrange the intel_rapl power capping driver to make
the respective interface drivers (TPMI, MSR, and MMOI) hold their
own settings and primitives and consolidate PL4 and PMU support
flags into rapl_defaults (Kuppuswamy Sathyanarayanan)
- Correct kernel-doc function parameter names in the power capping core
code (Randy Dunlap)
- Remove unneeded casting for HZ_PER_KHZ in devfreq (Andy Shevchenko)
- Use _visible attribute to replace create/remove_sysfs_files() in
devfreq (Pengjie Zhang)
- Add Tegra114 support to activity monitor device in tegra30-devfreq as
a preparation to upcoming EMC controller support (Svyatoslav Ryhel)
- Fix mistakes in cpupower man pages, add the boost and epp options to
the cpupower-frequency-info man page, and add the perf-bias option to
the cpupower-info man page (Roberto Ricci)
- Remove unnecessary extern declarations from getopt.h in arguments
parsing functions in cpufreq-set, cpuidle-info, cpuidle-set,
cpupower-info, and cpupower-set utilities (Kaushlendra Kumar)
Thanks!
---------------
Abel Vesa (1):
dt-bindings: cpufreq: qcom-hw: document Eliza cpufreq hardware
Alberto Garcia (1):
PM: hibernate: return -ENODATA if the snapshot image is not loaded
Andy Shevchenko (1):
PM / devfreq: Remove unneeded casting for HZ_PER_KHZ
Artem Bityutskiy (1):
intel_idle: Add Panther Lake C-states table
Eric Biggers (1):
PM: hibernate: x86: Remove inclusion of crypto/hash.h
Fabio M. De Francesco (1):
cpufreq: intel_pstate: Allow repeated intel_pstate disable
Faruque Ansari (1):
cpufreq: Add QCS8300 to cpufreq-dt-platdev blocklist
Gautham R. Shenoy (13):
amd-pstate: Fix memory leak in amd_pstate_epp_cpu_init()
amd-pstate: Update cppc_req_cached in fast_switch case
amd-pstate: Make certain freq_attrs conditionally visible
x86/cpufeatures: Add AMD CPPC Performance Priority feature.
amd-pstate: Add support for CPPC_REQ2 and FLOOR_PERF
amd-pstate: Add sysfs support for floor_freq and floor_count
amd-pstate: Introduce a tracepoint trace_amd_pstate_cppc_req2()
amd-pstate-ut: Add module parameter to select testcases
amd-pstate-ut: Add a testcase to validate the visibility of
driver attributes
Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file
Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file
Documentation/amd-pstate: Add documentation for
amd_pstate_floor_{freq,count}
MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer
Henry Tseng (1):
cpufreq: acpi-cpufreq: use DMI max speed when CPPC is unavailable
Huisong Li (1):
cpuidle: Simplify cpuidle_register_device() with guard()
Julian Braha (2):
cpufreq: clean up dead code in Kconfig
cpuidle: clean up dead dependencies on CPU_IDLE in Kconfig
K Prateek Nayak (2):
cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
Kaushlendra Kumar (1):
cpupower: remove extern declarations in cmd functions
Kuppuswamy Sathyanarayanan (18):
powercap: intel_rapl: Add a symbol namespace for intel_rapl exports
powercap: intel_rapl: Cleanup coding style
powercap: intel_rapl: Remove unused TIME_WINDOW macros
powercap: intel_rapl: Simplify rapl_compute_time_window_atom()
powercap: intel_rapl: Use shifts for power-of-2 operations
powercap: intel_rapl: Use GENMASK() and BIT() macros
powercap: intel_rapl: Use unit conversion macros from units.h
powercap: intel_rapl: Allow interface drivers to configure rapl_defaults
powercap: intel_rapl: Move TPMI default settings into TPMI
interface driver
thermal: intel: int340x: processor: Move RAPL defaults to MMIO driver
powercap: intel_rapl: Remove unused AVERAGE_POWER primitive
powercap: intel_rapl: Move MSR default settings into MSR interface driver
powercap: intel_rapl: Remove unused macro definitions
powercap: intel_rapl: Move primitive info to header for interface drivers
powercap: intel_rapl: Move TPMI primitives to TPMI driver
thermal: intel: int340x: processor: Move MMIO primitives to MMIO driver
powercap: intel_rapl: Move MSR primitives to MSR driver
powercap: intel_rapl: Consolidate PL4 and PMU support flags into
rapl_defaults
Manivannan Sadhasivam (1):
OPP: debugfs: Use performance level if available to distinguish
between rates
Mario Limonciello (AMD) (7):
cpufreq/amd-pstate: Add POWER_SUPPLY select for dynamic EPP
cpufreq/amd-pstate: Cache the max frequency in cpudata
cpufreq/amd-pstate: Add dynamic energy performance preference
cpufreq/amd-pstate: add kernel command line to override dynamic epp
cpufreq/amd-pstate: Add support for platform profile class
cpufreq/amd-pstate: Add support for raw EPP writes
cpufreq/amd-pstate-ut: Add a unit test for raw EPP
Ninad Naik (1):
Documentation: amd-pstate: fix dead links in the reference section
Pengjie Zhang (2):
cpufreq: Add debug print for current frequency in
__cpufreq_driver_target()
PM / devfreq: use _visible attribute to replace
create/remove_sysfs_files()
Pierre Gondois (2):
cpufreq: Remove max_freq_req update for pre-existing policy
cpufreq: Add boost_freq_req QoS request
Rafael J. Wysocki (2):
cpuidle: governors: menu: Refine stopped tick handling
cpuidle: governors: teo: Rearrange stopped tick handling
Randy Dunlap (1):
powercap: correct kernel-doc function parameter names
Roberto Ricci (4):
cpupower-idle-info.1: fix short option names
cpupower-frequency-info.1: use the proper name of the --perf option
cpupower-frequency-info.1: document --boost and --epp options
cpupower-info.1: describe the --perf-bias option
Rosen Penev (1):
cpufreq: tegra194: remove COMPILE_TEST
Svyatoslav Ryhel (1):
PM / devfreq: tegra30-devfreq: add support for Tegra114
Thierry Reding (2):
dt-bindings: arm: nvidia: Document the Tegra238 CCPLEX cluster
cpufreq: tegra194: Rename Tegra239 to Tegra238
Thorsten Blum (1):
cpufreq: governor: Use sysfs_emit() in sysfs show functions
Viresh Kumar (3):
OPP: Move break out of scoped_guard in dev_pm_opp_xlate_required_opp()
cpufreq: Add MAINTAINERS entry for CPPC driver
cpufreq: Allocate QoS freq_req objects with policy
Yury Norov (1):
cpufreq: optimize policy_is_shared()
---------------
Documentation/admin-guide/kernel-parameters.txt | 7 +
Documentation/admin-guide/pm/amd-pstate.rst | 87 ++-
.../arm/tegra/nvidia,tegra-ccplex-cluster.yaml | 1 +
.../bindings/cpufreq/cpufreq-qcom-hw.yaml | 1 +
MAINTAINERS | 25 +-
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/msr-index.h | 5 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/power/hibernate_64.c | 2 -
drivers/acpi/cppc_acpi.c | 3 +-
drivers/cpufreq/Kconfig | 5 +-
drivers/cpufreq/Kconfig.arm | 2 +-
drivers/cpufreq/Kconfig.x86 | 14 +
drivers/cpufreq/acpi-cpufreq.c | 31 +-
drivers/cpufreq/amd-pstate-trace.h | 35 ++
drivers/cpufreq/amd-pstate-ut.c | 279 ++++++++-
drivers/cpufreq/amd-pstate.c | 627 ++++++++++++++++++---
drivers/cpufreq/amd-pstate.h | 37 +-
drivers/cpufreq/cppc_cpufreq.c | 10 +-
drivers/cpufreq/cpufreq-dt-platdev.c | 1 +
drivers/cpufreq/cpufreq.c | 85 ++-
drivers/cpufreq/cpufreq_governor.h | 5 +-
drivers/cpufreq/intel_pstate.c | 6 +-
drivers/cpufreq/tegra194-cpufreq.c | 4 +-
drivers/cpuidle/Kconfig | 2 +-
drivers/cpuidle/Kconfig.mips | 2 +-
drivers/cpuidle/Kconfig.powerpc | 2 -
drivers/cpuidle/cpuidle.c | 12 +-
drivers/cpuidle/governors/gov.h | 5 +
drivers/cpuidle/governors/menu.c | 15 +-
drivers/cpuidle/governors/teo.c | 81 ++-
drivers/devfreq/devfreq.c | 108 ++--
drivers/devfreq/tegra30-devfreq.c | 17 +-
drivers/idle/intel_idle.c | 42 ++
drivers/opp/core.c | 2 +-
drivers/opp/debugfs.c | 20 +-
drivers/powercap/intel_rapl_common.c | 565 ++-----------------
drivers/powercap/intel_rapl_msr.c | 393 ++++++++++++-
drivers/powercap/intel_rapl_tpmi.c | 101 ++++
.../intel/int340x_thermal/processor_thermal_rapl.c | 81 +++
include/acpi/cppc_acpi.h | 1 +
include/linux/cpufreq.h | 11 +-
include/linux/intel_rapl.h | 52 +-
include/linux/powercap.h | 4 +-
include/linux/units.h | 3 +
kernel/power/user.c | 7 +-
kernel/sched/cpufreq_schedutil.c | 5 +-
rust/kernel/cpufreq.rs | 13 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
tools/power/cpupower/man/cpupower-frequency-info.1 | 8 +-
tools/power/cpupower/man/cpupower-idle-info.1 | 4 +-
tools/power/cpupower/man/cpupower-info.1 | 9 +-
tools/power/cpupower/utils/cpufreq-info.c | 2 -
tools/power/cpupower/utils/cpufreq-set.c | 2 -
tools/power/cpupower/utils/cpuidle-info.c | 2 -
tools/power/cpupower/utils/cpuidle-set.c | 2 -
tools/power/cpupower/utils/cpupower-info.c | 2 -
tools/power/cpupower/utils/cpupower-set.c | 2 -
58 files changed, 1951 insertions(+), 903 deletions(-)
^ permalink raw reply
* Re: [PATCH] cpufreq: CPPC: add autonomous mode boot parameter support
From: Pierre Gondois @ 2026-04-10 13:47 UTC (permalink / raw)
To: Sumit Gupta
Cc: linux-tegra, linux-kernel, linux-doc, zhenglifeng1, treding,
viresh.kumar, jonathanh, vsethi, ionela.voinescu, ksitaraman,
sanjayc, zhanjie9, corbet, mochs, skhan, bbasu, rdunlap, linux-pm,
mario.limonciello, rafael
In-Reply-To: <b8debb30-67a5-4d2b-8c08-8fd287f7258e@nvidia.com>
Hello Sumit,
On 4/6/26 20:08, Sumit Gupta wrote:
> Hi Pierre,
>
> Thank you for the comments.
> Sorry for late reply as I was on vacation.
>
No worries
>
> On 24/03/26 23:48, Pierre Gondois wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Sumit,
>>
>> On 3/17/26 16:10, Sumit Gupta wrote:
>>> Add kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable CPPC
>>> autonomous performance selection on all CPUs at system startup without
>>> requiring runtime sysfs manipulation. When autonomous mode is enabled,
>>> the hardware automatically adjusts CPU performance based on workload
>>> demands using Energy Performance Preference (EPP) hints.
>>>
>>> When auto_sel_mode=1:
>>> - Configure all CPUs for autonomous operation on first init
>>> - Set EPP to performance preference (0x0)
>>> - Use HW min/max when set; otherwise program from policy limits (caps)
>>> - Clamp desired_perf to bounds before enabling autonomous mode
>>> - Hardware controls frequency instead of the OS governor
>>>
>>> The boot parameter is applied only during first policy initialization.
>>> On hotplug, skip applying it so that the user's runtime sysfs
>>> configuration is preserved.
>>>
>>> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> (Documentation)
>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>> ---
>>> Part 1 [1] of this series was applied for 7.1 and present in next.
>>> Sending this patch as reworked version of 'patch 11' from [2] based
>>> on next.
>>>
>>> [1]
>>> https://lore.kernel.org/lkml/20260206142658.72583-1-sumitg@nvidia.com/
>>> [2]
>>> https://lore.kernel.org/lkml/20251223121307.711773-1-sumitg@nvidia.com/
>>> ---
>>> .../admin-guide/kernel-parameters.txt | 13 +++
>>> drivers/cpufreq/cppc_cpufreq.c | 84
>>> +++++++++++++++++--
>>> 2 files changed, 92 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt
>>> b/Documentation/admin-guide/kernel-parameters.txt
>>> index fa6171b5fdd5..de4b4c89edfe 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -1060,6 +1060,19 @@ Kernel parameters
>>> policy to use. This governor must be
>>> registered in the
>>> kernel before the cpufreq driver probes.
>>>
>>> + cppc_cpufreq.auto_sel_mode=
>>> + [CPU_FREQ] Enable ACPI CPPC autonomous
>>> performance
>>> + selection. When enabled, hardware
>>> automatically adjusts
>>> + CPU frequency on all CPUs based on workload
>>> demands.
>>> + In Autonomous mode, Energy Performance
>>> Preference (EPP)
>>> + hints guide hardware toward performance (0x0)
>>> or energy
>>> + efficiency (0xff).
>>> + Requires ACPI CPPC autonomous selection
>>> register support.
>>> + Format: <bool>
>>> + Default: 0 (disabled)
>>> + 0: use cpufreq governors
>>> + 1: enable if supported by hardware
>>> +
>>> cpu_init_udelay=N
>>> [X86,EARLY] Delay for N microsec between
>>> assert and de-assert
>>> of APIC INIT to start processors. This delay
>>> occurs
>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>>> b/drivers/cpufreq/cppc_cpufreq.c
>>> index 5dfb109cf1f4..49c148b2a0a4 100644
>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>> @@ -28,6 +28,9 @@
>>>
>>> static struct cpufreq_driver cppc_cpufreq_driver;
>>>
>>> +/* Autonomous Selection boot parameter */
>>> +static bool auto_sel_mode;
>>> +
>>> #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
>>> static enum {
>>> FIE_UNSET = -1,
>>> @@ -708,11 +711,74 @@ static int cppc_cpufreq_cpu_init(struct
>>> cpufreq_policy *policy)
>>> policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
>>> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>>>
>>> - ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> - if (ret) {
>>> - pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
>>> - caps->highest_perf, cpu, ret);
>>> - goto out;
>>> + /*
>>> + * Enable autonomous mode on first init if boot param is set.
>>> + * Check last_governor to detect first init and skip if auto_sel
>>> + * is already enabled.
>>> + */
>> If the goal is to set autosel only once at the driver init,
>> shouldn't this be done in cppc_cpufreq_init() ?
>> I understand that cpu_data doesn't exist yet in
>> cppc_cpufreq_init(), but this seems more appropriate to do
>> it there IMO.
>>
>> This means the cpudata should be updated accordingly
>> in this cppc_cpufreq_cpu_init() function.
>
> In an earlier version [1], the setup was in cppc_cpufreq_init() but
> was moved to cppc_cpufreq_cpu_init() to improve per-CPU error handling.
> Keeping the setup in cppc_cpufreq_init() helps to avoid the last_governor
> check. We can warn for a CPU failing to enable and continue so other
> CPUs keep autonomous mode.
> cppc_cpufreq_cpu_init() would then just check the auto_sel state
> from register and sync policy limits from min/max_perf registers when
> autonomous mode is active.
> Please let me know your thoughts.
FWIU the auto_sel_mode module parameter allows to
configure the default auto_sel_mode when the driver is
first loaded, so there should not need to check that again
whenever cppc_cpufreq_cpu_init() is called.
Maybe Ionela saw something we didn't see ?
Also just to be sure, should it still be possible to change
the auto_sel_mode through the sysfs if the driver was
loaded with auto_sel_mode=1 ?
>
> [1]
> https://lore.kernel.org/lkml/5593d364-ca37-41c5-b33f-f7e245d6d626@nvidia.com/
>
>
>>
>>> + if (auto_sel_mode && policy->last_governor[0] == '\0' &&
>>> + !cpu_data->perf_ctrls.auto_sel) {
>>> + /* Enable CPPC - optional register, some platforms
>>> need it */
>> The documentation of the CPPC Enable Register is subject to
>> interpretation, but IIUC the field should be set to use the CPPC
>> controls, so I assume this should be set in cppc_cpufreq_init()
>> instead ?
>
> Agree that the CPPC Enable is about using the CPPC control path
> in general and not only for autonomous selection.
> Will move cppc_set_enable() into cppc_cpufreq_init() or outside the
> autonomous mode block in cppc_cpufreq_cpu_init() as per conclusion
> of previous comment.
>
>>> + ret = cppc_set_enable(cpu, true);
>>> + if (ret && ret != -EOPNOTSUPP)
>>> + pr_warn("Failed to enable CPPC for CPU%d
>>> (%d)\n", cpu, ret);
>>> +
>>> + /*
>>> + * Prefer HW min/max_perf when set; otherwise program
>>> from
>>> + * policy limits derived earlier from caps.
>>> + * Clamp desired_perf to bounds and sync policy->cur.
>>> + */
>>> + if (!cpu_data->perf_ctrls.min_perf ||
>>> !cpu_data->perf_ctrls.max_perf)
>>
>> The function doesn't seem to exist.
>
> It is newly added in [2].
> Don't need to call it if we move the setup to cppc_cpufreq_init().
Ah ok right thanks.
>
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ea3db45ae476889a1ba0ab3617e6afdeeefbda3d
>
>
>
>>
>>> + cppc_cpufreq_update_perf_limits(cpu_data, policy);
>>> +
>>> + cpu_data->perf_ctrls.desired_perf =
>>> + clamp_t(u32, cpu_data->perf_ctrls.desired_perf,
>>> + cpu_data->perf_ctrls.min_perf,
>>> + cpu_data->perf_ctrls.max_perf);
>>> +
>>> + policy->cur = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.desired_perf);
>>> +
>>
>> Maybe this should also be done in cppc_cpufreq_init()
>> if the auto_sel_mode parameter is set ?
>
> Yes.
>
>>
>>> + /* EPP is optional - some platforms may not support it */
>>> + ret = cppc_set_epp(cpu, CPPC_EPP_PERFORMANCE_PREF);
>>> + if (ret && ret != -EOPNOTSUPP)
>>> + pr_warn("Failed to set EPP for CPU%d (%d)\n",
>>> cpu, ret);
>>> + else if (!ret)
>>> + cpu_data->perf_ctrls.energy_perf =
>>> CPPC_EPP_PERFORMANCE_PREF;
>>> +
>>> + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> + if (ret) {
>>> + pr_debug("Err setting perf for autonomous mode
>>> CPU:%d ret:%d\n",
>>> + cpu, ret);
>>> + goto out;
>>> + }
>>> +
>>> + ret = cppc_set_auto_sel(cpu, true);
>>> + if (ret && ret != -EOPNOTSUPP) {
>>> + pr_warn("Failed autonomous config for CPU%d
>>> (%d)\n",
>>> + cpu, ret);
>>> + goto out;
>>> + }
>>> + if (!ret)
>>> + cpu_data->perf_ctrls.auto_sel = true;
>>> + }
>>> +
>>> + if (cpu_data->perf_ctrls.auto_sel) {
>>
>> There is a patchset ongoing which tries to remove
>> setting policy->min/max from driver initialization.
>> Indeed, these values are only temporarily valid,
>> until the governor override them.
>> It is not sure yet the patch will be accepted though.
>>
>> https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@arm.com/
>>
>
>
> You are right that policy->min/max from .init() are temporary today
> as cpufreq_set_policy() overwrites them before the governor starts.
>
> On my test platform (highest == nominal, lowest_nonlinear == lowest),
> this had no visible effect because the BIOS bounds and cpuinfo range
> end up identical. But on platforms where they differ, the governor
> would widen the range to full cpuinfo limits.
>
> I think your patch [3] fixes this by giving these the right semantic as
> initial QoS requests. With it, cpufreq_set_policy() preserves the policy
> limits set from min/max_perf registers in .init(), which can either be
> BIOS values on first boot or last user configured values before hotplug.
>
> I will update the comment in v2 to reflect QoS seeding intent.
>
> I see that the first two patches of your series [3] is applied for 7.1.
> Do you plan to send the pending patch (3/4) from [3]?
>
I need to ping Viresh to check if this is still relevant.
> [3]
> https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@arm.com/
>
>
>>
>>
>>> + /* Sync policy limits from HW when autonomous mode is
>>> active */
>>> + policy->min = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.min_perf ?:
>>> + caps->lowest_nonlinear_perf);
>>> + policy->max = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.max_perf ?:
>>> + caps->nominal_perf);
>>> + } else {
>>> + /* Normal mode: governors control frequency */
>>> + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> + if (ret) {
>>> + pr_debug("Err setting perf value:%d on CPU:%d.
>>> ret:%d\n",
>>> + caps->highest_perf, cpu, ret);
>>> + goto out;
>>> + }
>>> }
>>>
>>> cppc_cpufreq_cpu_fie_init(policy);
>>> @@ -1038,10 +1104,18 @@ static int __init cppc_cpufreq_init(void)
>>>
>>> static void __exit cppc_cpufreq_exit(void)
>>> {
>>> + unsigned int cpu;
>>> +
>>> + for_each_present_cpu(cpu)
>>> + cppc_set_auto_sel(cpu, false);
>>
>> If the firmware has a default EPP value, it means that loading
>> and the unloading the driver will reset this default EPP value.
>> Maybe the initial EPP value and/or the auto_sel value should be
>> cached somewhere and restored on exit ?
>> I don't know if this is actually an issue, this is just to signal it.
>
> The auto_sel_mode boot path programs EPP to performance preference(0),
> not the firmware’s previous value. On unload we only call
> cppc_set_auto_sel(false); we do not restore EPP, min/max perf,
> or other CPPC fields to firmware defaults.
Yes right, so loading/unloading the driver might change the
default EPP value.
>
> Thank you,
> Sumit Gupta
>
> ....
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox