* [PATCH v7 0/4] KVM: arm64: PMU: Use multiple host PMUs
From: Akihiko Odaki @ 2026-04-18 8:14 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon, Kees Cook,
Gustavo A. R. Silva, Paolo Bonzini, Jonathan Corbet, Shuah Khan
Cc: linux-arm-kernel, kvmarm, linux-kernel, linux-hardening, devel,
kvm, linux-doc, linux-kselftest, Akihiko Odaki
On a heterogeneous arm64 system, KVM's PMU emulation is based on the
features of a single host PMU instance. When a vCPU is migrated to a
pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
incrementing.
Although this behavior is permitted by the architecture, Windows does
not handle it gracefully and may crash with a division-by-zero error.
The current workaround requires VMMs to pin vCPUs to a set of pCPUs
that share a compatible PMU. This is difficult to implement correctly in
QEMU/libvirt, where pinning occurs after vCPU initialization, and it
also restricts the guest to a subset of available pCPUs.
This patch introduces the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
attribute. If set, PMUv3 will be emulated without programmable event
counters. KVM will be able to run VCPUs on any physical CPUs with a
compatible hardware PMU.
This allows Windows guests to run reliably on heterogeneous systems
without crashing, even without vCPU pinning, and enables VMMs to
schedule vCPUs across all available pCPUs, making full use of the host
hardware.
A QEMU patch that demonstrates the usage of the new attribute is
available at:
https://lore.kernel.org/qemu-devel/20260225-kvm-v2-1-b8d743db0f73@rsg.ci.i.u-tokyo.ac.jp/
("[PATCH RFC v2] target/arm/kvm: Choose PMU backend")
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
Changes in v7:
- Fixed the vCPU run hang in test_fixed_counters_only().
- Link to v6: https://lore.kernel.org/r/20260413-hybrid-v6-0-e79d760f7f1b@rsg.ci.i.u-tokyo.ac.jp
Changes in v6:
- Removed WARN_ON_ONCE() in kvm_pmu_create_perf_event(). It can be
triggered in kvm_arch_vcpu_load() before it checks supported_cpus.
- Removed an extra lockdep assertion in kvm_arm_pmu_v3_get_attr().
- Fixed error messages in test_fixed_counters_only().
- Fixed the vCPU run in test_fixed_counters_only().
- Link to v5: https://lore.kernel.org/r/20260411-hybrid-v5-0-b043b4d9f49e@rsg.ci.i.u-tokyo.ac.jp
Changes in v5:
- Rebased.
- Fixed the order to clear KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY in
kvm_arm_pmu_v3_set_pmu().
- Fixed the setting of KVM_ARM_VCPU_PMU_V3_IRQ in
test_fixed_counters_only().
- Changed to WARN_ON_ONCE() when kvm_pmu_probe_armpmu() returns NULL in
kvm_pmu_create_perf_event(), which is no longer supposed to happen.
- Link to v4: https://lore.kernel.org/r/20260317-hybrid-v4-0-bd62bcd48644@rsg.ci.i.u-tokyo.ac.jp
Changes in v4:
- Extracted kvm_pmu_enabled_counter_mask() into a separate patch.
- Added patch "KVM: arm64: PMU: Protect the list of PMUs with RCU".
- Merged KVM_REQ_CREATE_PMU into KVM_REQ_RELOAD_PMU.
- Added a check to avoid unnecessary KVM_REQ_RELOAD_PMU requests.
- Dropped the change to avoid setting kvm_arm_set_default_pmu() when
KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY is not set.
- Link to v3: https://lore.kernel.org/r/20260225-hybrid-v3-0-46e8fe220880@rsg.ci.i.u-tokyo.ac.jp
Changes in v3:
- Renamed the attribute to KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY.
- Changed to request the creation of perf counters when loading vCPU.
- Link to v2: https://lore.kernel.org/r/20250806-hybrid-v2-0-0661aec3af8c@rsg.ci.i.u-tokyo.ac.jp
Changes in v2:
- Added the KVM_ARM_VCPU_PMU_V3_COMPOSITION attribute to opt in the
feature.
- Added code to handle overflow.
- Link to v1: https://lore.kernel.org/r/20250319-hybrid-v1-1-4d1ada10e705@daynix.com
---
Akihiko Odaki (4):
KVM: arm64: PMU: Add kvm_pmu_enabled_counter_mask()
KVM: arm64: PMU: Protect the list of PMUs with RCU
KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
KVM: arm64: selftests: Test PMU_V3_FIXED_COUNTERS_ONLY
Documentation/virt/kvm/devices/vcpu.rst | 29 ++++
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/pmu-emul.c | 187 ++++++++++++++-------
include/kvm/arm_pmu.h | 2 +
.../selftests/kvm/arm64/vpmu_counter_access.c | 153 ++++++++++++++---
7 files changed, 292 insertions(+), 83 deletions(-)
---
base-commit: 94b4ae79ebb42a8a6f2124b4d4b033b15a98e4f9
change-id: 20250224-hybrid-01d5ff47edd2
Best regards,
--
Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
^ permalink raw reply
* [PATCH v7 2/4] KVM: arm64: PMU: Protect the list of PMUs with RCU
From: Akihiko Odaki @ 2026-04-18 8:14 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon, Kees Cook,
Gustavo A. R. Silva, Paolo Bonzini, Jonathan Corbet, Shuah Khan
Cc: linux-arm-kernel, kvmarm, linux-kernel, linux-hardening, devel,
kvm, linux-doc, linux-kselftest, Akihiko Odaki
In-Reply-To: <20260418-hybrid-v7-0-2bf39ad009bf@rsg.ci.i.u-tokyo.ac.jp>
Convert the list of PMUs to a RCU-protected list that has primitives to
avoid read-side contention.
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
arch/arm64/kvm/pmu-emul.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 59ec96e09321..ef5140bbfe28 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -7,9 +7,9 @@
#include <linux/cpu.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
-#include <linux/list.h>
#include <linux/perf_event.h>
#include <linux/perf/arm_pmu.h>
+#include <linux/rculist.h>
#include <linux/uaccess.h>
#include <asm/kvm_emulate.h>
#include <kvm/arm_pmu.h>
@@ -26,7 +26,6 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc);
bool kvm_supports_guest_pmuv3(void)
{
- guard(mutex)(&arm_pmus_lock);
return !list_empty(&arm_pmus);
}
@@ -808,7 +807,7 @@ void kvm_host_pmu_init(struct arm_pmu *pmu)
return;
entry->arm_pmu = pmu;
- list_add_tail(&entry->entry, &arm_pmus);
+ list_add_tail_rcu(&entry->entry, &arm_pmus);
}
static struct arm_pmu *kvm_pmu_probe_armpmu(void)
@@ -817,7 +816,7 @@ static struct arm_pmu *kvm_pmu_probe_armpmu(void)
struct arm_pmu *pmu;
int cpu;
- guard(mutex)(&arm_pmus_lock);
+ guard(rcu)();
/*
* It is safe to use a stale cpu to iterate the list of PMUs so long as
@@ -837,7 +836,7 @@ static struct arm_pmu *kvm_pmu_probe_armpmu(void)
* carried here.
*/
cpu = raw_smp_processor_id();
- list_for_each_entry(entry, &arm_pmus, entry) {
+ list_for_each_entry_rcu(entry, &arm_pmus, entry) {
pmu = entry->arm_pmu;
if (cpumask_test_cpu(cpu, &pmu->supported_cpus))
@@ -1088,9 +1087,9 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
int ret = -ENXIO;
lockdep_assert_held(&kvm->arch.config_lock);
- mutex_lock(&arm_pmus_lock);
+ guard(rcu)();
- list_for_each_entry(entry, &arm_pmus, entry) {
+ list_for_each_entry_rcu(entry, &arm_pmus, entry) {
arm_pmu = entry->arm_pmu;
if (arm_pmu->pmu.type == pmu_id) {
if (kvm_vm_has_ran_once(kvm) ||
@@ -1106,7 +1105,6 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
}
}
- mutex_unlock(&arm_pmus_lock);
return ret;
}
--
2.53.0
^ permalink raw reply related
* [PATCH v7 4/4] KVM: arm64: selftests: Test PMU_V3_FIXED_COUNTERS_ONLY
From: Akihiko Odaki @ 2026-04-18 8:14 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon, Kees Cook,
Gustavo A. R. Silva, Paolo Bonzini, Jonathan Corbet, Shuah Khan
Cc: linux-arm-kernel, kvmarm, linux-kernel, linux-hardening, devel,
kvm, linux-doc, linux-kselftest, Akihiko Odaki
In-Reply-To: <20260418-hybrid-v7-0-2bf39ad009bf@rsg.ci.i.u-tokyo.ac.jp>
Assert the following:
- KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY is unset at initialization.
- KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY can be set.
- Setting KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY for the first time
after setting an event filter results in EBUSY.
- KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY can be set again even if an
event filter has already been set.
- Setting KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY after running a VCPU
results in EBUSY.
- The existing test cases pass with
KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY set.
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
.../selftests/kvm/arm64/vpmu_counter_access.c | 153 +++++++++++++++++----
1 file changed, 127 insertions(+), 26 deletions(-)
diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
index ae36325c022f..0ed0a8513b03 100644
--- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
+++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c
@@ -403,12 +403,7 @@ static void create_vpmu_vm(void *guest_code)
{
struct kvm_vcpu_init init;
uint8_t pmuver, ec;
- uint64_t dfr0, irq = 23;
- struct kvm_device_attr irq_attr = {
- .group = KVM_ARM_VCPU_PMU_V3_CTRL,
- .attr = KVM_ARM_VCPU_PMU_V3_IRQ,
- .addr = (uint64_t)&irq,
- };
+ uint64_t dfr0;
/* The test creates the vpmu_vm multiple times. Ensure a clean state */
memset(&vpmu_vm, 0, sizeof(vpmu_vm));
@@ -434,8 +429,6 @@ static void create_vpmu_vm(void *guest_code)
TEST_ASSERT(pmuver != ID_AA64DFR0_EL1_PMUVer_IMP_DEF &&
pmuver >= ID_AA64DFR0_EL1_PMUVer_IMP,
"Unexpected PMUVER (0x%x) on the vCPU with PMUv3", pmuver);
-
- vcpu_ioctl(vpmu_vm.vcpu, KVM_SET_DEVICE_ATTR, &irq_attr);
}
static void destroy_vpmu_vm(void)
@@ -461,15 +454,30 @@ static void run_vcpu(struct kvm_vcpu *vcpu, uint64_t pmcr_n)
}
}
-static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters, bool expect_fail)
+static void guest_code_done(void)
+{
+ GUEST_DONE();
+}
+
+static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters,
+ bool fixed_counters_only,
+ bool expect_fail)
{
struct kvm_vcpu *vcpu;
unsigned int prev;
int ret;
+ uint64_t irq = 23;
create_vpmu_vm(guest_code);
vcpu = vpmu_vm.vcpu;
+ if (fixed_counters_only)
+ vcpu_device_attr_set(vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+
+ vcpu_device_attr_set(vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_IRQ, &irq);
+
prev = get_pmcr_n(vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0)));
ret = __vcpu_device_attr_set(vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
@@ -489,15 +497,15 @@ static void test_create_vpmu_vm_with_nr_counters(unsigned int nr_counters, bool
* Create a guest with one vCPU, set the PMCR_EL0.N for the vCPU to @pmcr_n,
* and run the test.
*/
-static void run_access_test(uint64_t pmcr_n)
+static void run_access_test(uint64_t pmcr_n, bool fixed_counters_only)
{
uint64_t sp;
struct kvm_vcpu *vcpu;
struct kvm_vcpu_init init;
- pr_debug("Test with pmcr_n %lu\n", pmcr_n);
+ pr_debug("Test with pmcr_n %lu, fixed_counters_only %d\n", pmcr_n, fixed_counters_only);
- test_create_vpmu_vm_with_nr_counters(pmcr_n, false);
+ test_create_vpmu_vm_with_nr_counters(pmcr_n, fixed_counters_only, false);
vcpu = vpmu_vm.vcpu;
/* Save the initial sp to restore them later to run the guest again */
@@ -531,14 +539,14 @@ static struct pmreg_sets validity_check_reg_sets[] = {
* Create a VM, and check if KVM handles the userspace accesses of
* the PMU register sets in @validity_check_reg_sets[] correctly.
*/
-static void run_pmregs_validity_test(uint64_t pmcr_n)
+static void run_pmregs_validity_test(uint64_t pmcr_n, bool fixed_counters_only)
{
int i;
struct kvm_vcpu *vcpu;
uint64_t set_reg_id, clr_reg_id, reg_val;
uint64_t valid_counters_mask, max_counters_mask;
- test_create_vpmu_vm_with_nr_counters(pmcr_n, false);
+ test_create_vpmu_vm_with_nr_counters(pmcr_n, fixed_counters_only, false);
vcpu = vpmu_vm.vcpu;
valid_counters_mask = get_counters_mask(pmcr_n);
@@ -588,11 +596,11 @@ static void run_pmregs_validity_test(uint64_t pmcr_n)
* the vCPU to @pmcr_n, which is larger than the host value.
* The attempt should fail as @pmcr_n is too big to set for the vCPU.
*/
-static void run_error_test(uint64_t pmcr_n)
+static void run_error_test(uint64_t pmcr_n, bool fixed_counters_only)
{
pr_debug("Error test with pmcr_n %lu (larger than the host)\n", pmcr_n);
- test_create_vpmu_vm_with_nr_counters(pmcr_n, true);
+ test_create_vpmu_vm_with_nr_counters(pmcr_n, fixed_counters_only, true);
destroy_vpmu_vm();
}
@@ -622,22 +630,115 @@ static bool kvm_supports_nr_counters_attr(void)
return supported;
}
-int main(void)
+static void test_config(uint64_t pmcr_n, bool fixed_counters_only)
{
- uint64_t i, pmcr_n;
-
- TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_PMU_V3));
- TEST_REQUIRE(kvm_supports_vgic_v3());
- TEST_REQUIRE(kvm_supports_nr_counters_attr());
+ uint64_t i;
- pmcr_n = get_pmcr_n_limit();
for (i = 0; i <= pmcr_n; i++) {
- run_access_test(i);
- run_pmregs_validity_test(i);
+ run_access_test(i, fixed_counters_only);
+ run_pmregs_validity_test(i, fixed_counters_only);
}
for (i = pmcr_n + 1; i < ARMV8_PMU_MAX_COUNTERS; i++)
- run_error_test(i);
+ run_error_test(i, fixed_counters_only);
+}
+
+static void test_fixed_counters_only(void)
+{
+ struct kvm_pmu_event_filter filter = { .nevents = 0 };
+ struct kvm_vm *vm;
+ struct kvm_vcpu *running_vcpu;
+ struct kvm_vcpu *stopped_vcpu;
+ struct kvm_vcpu_init init;
+ int ret;
+ uint64_t irq = 23;
+
+ create_vpmu_vm(guest_code);
+ ret = __vcpu_has_device_attr(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY);
+ if (ret) {
+ TEST_ASSERT(ret == -1 && errno == ENXIO,
+ KVM_IOCTL_ERROR(KVM_HAS_DEVICE_ATTR, ret));
+ destroy_vpmu_vm();
+ return;
+ }
+
+ /* Assert that FIXED_COUNTERS_ONLY is unset at initialization. */
+ ret = __vcpu_device_attr_get(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+ TEST_ASSERT(ret == -1 && errno == ENXIO,
+ KVM_IOCTL_ERROR(KVM_GET_DEVICE_ATTR, ret));
+
+ /* Assert that setting FIXED_COUNTERS_ONLY succeeds. */
+ vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+
+ /* Assert that getting FIXED_COUNTERS_ONLY succeeds. */
+ vcpu_device_attr_get(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+
+ /*
+ * Assert that setting FIXED_COUNTERS_ONLY again succeeds even if an
+ * event filter has already been set.
+ */
+ vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FILTER, &filter);
+
+ vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+
+ destroy_vpmu_vm();
+
+ create_vpmu_vm(guest_code);
+
+ /*
+ * Assert that setting FIXED_COUNTERS_ONLY results in EBUSY if an event
+ * filter has already been set while FIXED_COUNTERS_ONLY has not.
+ */
+ vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FILTER, &filter);
+
+ ret = __vcpu_device_attr_set(vpmu_vm.vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+ TEST_ASSERT(ret == -1 && errno == EBUSY,
+ KVM_IOCTL_ERROR(KVM_SET_DEVICE_ATTR, ret));
+
+ destroy_vpmu_vm();
+
+ /*
+ * Assert that setting FIXED_COUNTERS_ONLY after running a VCPU results
+ * in EBUSY.
+ */
+ vm = vm_create(2);
+ vm_ioctl(vm, KVM_ARM_PREFERRED_TARGET, &init);
+ init.features[0] |= (1 << KVM_ARM_VCPU_PMU_V3);
+ running_vcpu = aarch64_vcpu_add(vm, 0, &init, guest_code_done);
+ stopped_vcpu = aarch64_vcpu_add(vm, 1, &init, guest_code_done);
+ kvm_arch_vm_finalize_vcpus(vm);
+ vcpu_device_attr_set(running_vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_IRQ, &irq);
+ vcpu_device_attr_set(running_vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_INIT, NULL);
+ vcpu_run(running_vcpu);
+
+ ret = __vcpu_device_attr_set(stopped_vcpu, KVM_ARM_VCPU_PMU_V3_CTRL,
+ KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY, NULL);
+ TEST_ASSERT(ret == -1 && errno == EBUSY,
+ KVM_IOCTL_ERROR(KVM_SET_DEVICE_ATTR, ret));
+
+ kvm_vm_free(vm);
+
+ test_config(0, true);
+}
+
+int main(void)
+{
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_PMU_V3));
+ TEST_REQUIRE(kvm_supports_vgic_v3());
+ TEST_REQUIRE(kvm_supports_nr_counters_attr());
+
+ test_config(get_pmcr_n_limit(), false);
+ test_fixed_counters_only();
return 0;
}
--
2.53.0
^ permalink raw reply related
* Re: [PATCH] docs/zh_CN: add module-signing Chinese translation
From: Dongliang Mu @ 2026-04-18 6:24 UTC (permalink / raw)
To: Yan Zhu, Alex Shi, alexs, si.yanteng, corbet
Cc: skhan, linux-doc, linux-kernel
In-Reply-To: <tencent_70D8F75417524BB9ACD33B9223F831030407@qq.com>
On 4/18/26 1:34 PM, Yan Zhu wrote:
>
>
> On 4/10/2026 7:21 PM, Alex Shi wrote:
>>
>>
>> On 2026/4/1 23:40, Yan Zhu wrote:
>>> Translate .../admin-guide/module-signing.rst into Chinese.
>>>
>>> Update the translation through commit 0ad9a71933e7
>>> ("modsign: Enable ML-DSA module signing")
>>>
>>> Signed-off-by: Yan Zhu<zhuyan2015@qq.com>
>>> ---
>>> .../zh_CN/admin-guide/module-signing.rst | 242
>>> ++++++++++++++++++
>>> 1 file changed, 242 insertions(+)
>>> create mode 100644 Documentation/translations/zh_CN/admin-guide/
>>> module-signing.rst
>>>
>>> diff --git a/Documentation/translations/zh_CN/admin-guide/module-
>>> signing.rst b/Documentation/translations/zh_CN/admin-guide/module-
>>> signing.rst
>>> new file mode 100644
>>> index 000000000000..b8c209dd229d
>>> --- /dev/null
>>> +++ b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
>>> @@ -0,0 +1,242 @@
>>> +.. SPDX-License-Identifier: GPL-2.0
>>> +.. include:: ../disclaimer-zh_CN.rst
>>> +
>>> +:Original: Documentation/admin-guide/module-signing.rst
>>> +:翻译:
>>> + 朱岩 Yan Zhu<zhuyan2015@qq.com>
>>> +
>>> +
>>> +==========================
>>> +内核模块签名机制
>>> +==========================
>>> +
>>> +.. 目录
>>> +..
>>> +.. - 概述
>>> +.. - 配置模块签名
>>> +.. - 生成签名密钥
>>> +.. - 内核中的公钥
>>> +.. - 模块手动签名
>>> +.. - 已签名模块和剥离
>>> +.. - 加载已签名模块
>>> +.. - 无效签名和未签名模块
>>> +.. - 管理/保护私钥
>>> +
>>> +
>>> +概述
>>> +====
>>> +
>>> +内核模块签名机制在安装过程中对模块进行加密签名,然后在加载模块时检查
>>> 签名。
>>> +这通过禁止加载未签名的模块或使用无效密钥签名的模块来提高内核安全性。
>>> +模块签名通过使恶意模块更难加载到内核中来增加安全性。
>>> +模块签名检查在内核中完成,因此不需要受信任的用户空间位。
>>> +
>>> +此机制使用 X.509 ITU-T 标准证书对涉及的公钥进行编码。
>>> +签名本身不以任何工业标准类型编码。
>>> +内置机制目前仅支持 RSA、NIST P-384 ECDSA 和 NIST FIPS-204 ML-DSA 公钥
>>> 签名标准(尽管它是可插拔的并允许使用其他标准)。
>>
>> This line is too long, and not align with other lines. you need to
>> follow the coding style in documents.
>>
>>
>>> +对于 RSA 和 ECDSA,可以使用的可能的哈希算法是大小为 256、384 和 512 的
>>> SHA-2 和 SHA-3(算法由签名中的数据选择);
>>> +ML-DSA会自行进行哈希运算,但允许与SHA512哈希算法结合用于签名属性。
>>> +
>>> +配置模块签名
>>> +============
>>> +
>>> +通过进入内核配置的 :menuselection:`Enable Loadable Module Support` 菜
>>> 单并打开以下选项来启用模块签名机制::
>>> +
>>> + CONFIG_MODULE_SIG "Module signature verification"
>>> +
>>> +这有多个可用选项:
>>> +
>>> + (1) :menuselection:`Require modules to be validly signed`
>>> + (``CONFIG_MODULE_SIG_FORCE``)
>>> +
>>> + 这指定了内核应如何处理其密钥未知或未签名的模块。
>>> +
>>> + 如果关闭(即"宽松模式"),则允许使用不可用密钥和未签名的模块,
>>> +
>>> 但内核将被标记为受污染,并且相关模块将被标记为受污染,显示字符'E'。
>>> +
>>> + 如果打开(即"限制模式"),只有具有有效签名且可由内核拥有的公钥验
>>> 证的模块才会被加载。
>>> + 所有其他模块将生成错误。
>>> +
>>> + 无论此处的设置如何,如果模块的签名块无法解析,它将被直接拒绝。
>>> +
>>> +
>>> + (2) :menuselection:`Automatically sign all modules`
>>> + (``CONFIG_MODULE_SIG_ALL``)
>>> +
>>> + 如果打开此选项,则在构建的 modules_install
>>> 阶段期间将自动签名模块。
>>> + 如果关闭,则必须使用以下命令手动签名模块::
>>> +
>>> + scripts/sign-file
>>> +
>>> +
>>> + (3) :menuselection:`Which hash algorithm should modules be signed
>>> with?`
>>> +
>>> + 这提供了安装阶段将用于签名模块的哈希算法选择:
>>> +
>>> + ===============================
>>> ==========================================
>>> + ``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules
>>> with SHA-256`
>>> + ``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules
>>> with SHA-384`
>>> + ``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules
>>> with SHA-512`
>>> + ``CONFIG_MODULE_SIG_SHA3_256`` :menuselection:`Sign modules
>>> with SHA3-256`
>>> + ``CONFIG_MODULE_SIG_SHA3_384`` :menuselection:`Sign modules
>>> with SHA3-384`
>>> + ``CONFIG_MODULE_SIG_SHA3_512`` :menuselection:`Sign modules
>>> with SHA3-512`
>>> + ===============================
>>> ==========================================
>>
>> Got errors here:
>> Applying: docs/zh_CN: add module-signing Chinese translation
>> /home/alexshi/linuxdoc/.git/rebase-apply/patch:87: indent with spaces.
>> ===============================
>> ==========================================
>> /home/alexshi/linuxdoc/.git/rebase-apply/patch:94: indent with spaces.
>> ===============================
>> ==========================================
>> warning: 2 lines add whitespace errors.>
>> Thanks
>> Alex
>
> Thanks for your review, I have fixed those issues in v3 patch.
The revision patch should be v2. Don't skip numbers.
Dongliang Mu
>
>>> +
>>> + 此处选择的算法也将被构建到内核中(而不是作为模块),
>>> + 以便使用该算法签名的模块可以在不导致循环依赖的情况下检查其签名。
>>> +
>>> +
>>> + (4) :menuselection:`File name or PKCS#11 URI of module signing key`
>>> + (``CONFIG_MODULE_SIG_KEY``)
>>> +
>>> + 将此选项设置为除默认值 ``certs/signing_key.pem`` 之外的其他值将
>>> 禁用签名密钥的自动生成,
>>> + 并允许使用您选择的密钥对内核模块进行签名。
>>> + 提供的字符串应标识包含私钥及其对应的 PEM 格式 X.509 证书的文件,
>>> + 或者在 OpenSSL ENGINE_pkcs11 功能正常的系统上,使用 RFC7512 定义
>>> 的 PKCS#11 URI。
>>> + 在后一种情况下,PKCS#11 URI 应引用证书和私钥。
>>> +
>>> + 如果包含私钥的 PEM 文件已加密,或者 PKCS#11 令牌需要 PIN,
>>> + 可以通过 ``KBUILD_SIGN_PIN`` 变量在构建时提供。
>>> +
>>> +
>>> + (5) :menuselection:`Additional X.509 keys for default system keyring`
>>> + (``CONFIG_SYSTEM_TRUSTED_KEYS``)
>>> +
>>> + 此选项可设置为包含附加证书的 PEM 编码文件的文件名,
>>> + 这些证书将默认包含在系统密钥环中。
>>> +
>>> +请注意,启用模块签名会为内核构建过程添加对执行签名工具的 OpenSSL 开发
>>> 包的依赖。
>>> +
>>> +
>>> +生成签名密钥
>>> +============
>>> +
>>> +生成和检查签名需要加密密钥对。私钥用于生成签名,相应的公钥用于检查签名。
>>>
>>> +私钥仅在构建期间需要,之后可以删除或安全存储。
>>> +公钥被构建到内核中,以便在加载模块时可以使用它来检查签名。
>>> +
>>> +在正常情况下,当 ``CONFIG_MODULE_SIG_KEY`` 保持默认值时,
>>> +如果文件中不存在密钥对,内核构建将使用 openssl 自动生成新的密钥对::
>>> +
>>> + certs/signing_key.pem
>>> +
>>> +在构建 vmlinux 期间(公钥需要构建到 vmlinux 中)使用参数::
>>> +
>>> + certs/x509.genkey
>>> +
>>> +文件(如果尚不存在也会生成)。
>>> +
>>> +可以在 RSA(``MODULE_SIG_KEY_TYPE_RSA``)、ECDSA
>>> (``MODULE_SIG_KEY_TYPE_ECDSA``)
>>> +和 ML-DSA(``MODULE_SIG_KEY_TYPE_MLDSA_*``)之间选择生成 RSA 4k、NIST
>>> P-384 密钥对或 ML-DSA 44、65 或 87 密钥对。
>>> +
>>> +强烈建议您提供自己的 x509.genkey 文件。
>>> +
>>> +最值得注意的是,在 x509.genkey 文件中,req_distinguished_name 部分应
>>> 从默认值更改::
>>> +
>>> + [ req_distinguished_name ]
>>> + #O = Unspecified company
>>> + CN = Build time autogenerated kernel key
>>> + #emailAddress =unspecified.user@unspecified.company
>>> +
>>> +生成的 RSA 密钥大小也可以通过以下方式设置::
>>> +
>>> + [ req ]
>>> + default_bits = 4096
>>> +
>>> +也可以使用位于 Linux 内核源代码树根节点中的 x509.genkey 密钥生成配置
>>> 文件和 openssl 命令手动生成公钥/私钥文件。
>>> +以下是生成公钥/私钥文件的示例::
>>> +
>>> + openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
>>> + -config x509.genkey -outform PEM -out kernel_key.pem \
>>> + -keyout kernel_key.pem
>>> +
>>> +然后可以将生成的 kernel_key.pem 文件的完整路径名指定在
>>> ``CONFIG_MODULE_SIG_KEY`` 选项中,
>>> +并且将使用其中的证书和密钥而不是自动生成的密钥对。
>>> +
>>> +
>>> +内核中的公钥
>>> +============
>>> +
>>> +内核包含一个可由 root 查看的公钥环。它们在名为 ".builtin_trusted_keys"
>>> 的密钥环中,
>>> +可以通过以下方式查看::
>>> +
>>> + [root@deneb ~]# cat /proc/keys
>>> + ...
>>> + 223c7853 I------ 1 perm 1f030000 0 0 keyring
>>> .builtin_trusted_keys: 1
>>> + 302d2d52 I------ 1 perm 1f010000 0 0 asymmetri
>>> Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079:
>>> X509.RSA a7118079 []
>>> +
>>> +除了专门为模块签名生成的公钥外,还可以在 ``CONFIG_SYSTEM_TRUSTED_KEYS``
>>> 配置选项引用的 PEM 编码文件中提供其他受 信任的证书。
>>> +
>>> +此外,架构代码可以从硬件存储中获取公钥并将其添加(例如从 UEFI 密钥数
>>> 据库)。
>>> +
>>> +最后,可以通过以下方式添加其他公钥::
>>> +
>>> + keyctl padd asymmetric "" [.builtin_trusted_keys-ID] <[key-file]
>>> +
>>> +例如::
>>> +
>>> + keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
>>> +
>>> +但是,请注意,内核只允许将由已驻留在 ``.builtin_trusted_keys`` 中的密
>>> 钥有效签名的密钥添加到 ``.builtin_trusted_keys``。
>>> +
>>> +模块手动签名
>>> +============
>>> +
>>> +要手动对模块进行签名,请使用 Linux 内核源代码树中可用的 scripts/sign-
>>> file 工具。
>>> +该脚本需要 4 个参数:
>>> +
>>> + 1. 哈希算法(例如,sha256)
>>> + 2. 私钥文件名或 PKCS#11 URI
>>> + 3. 公钥文件名
>>> + 4. 要签名的内核模块
>>> +
>>> +以下是签名内核模块的示例::
>>> +
>>> + scripts/sign-file sha512 kernel-signkey.priv \
>>> + kernel-signkey.x509 module.ko
>>> +
>>> +使用的哈希算法不必与配置的算法匹配,但如果不同,
>>> +应确保哈希算法要么内置在内核中,要么可以在不需要自身的情况下加载。
>>> +
>>> +如果私钥需要密码或 PIN,可以在 $KBUILD_SIGN_PIN 环境变量中提供。
>>> +
>>> +
>>> +已签名模块和剥离
>>> +================
>>> +
>>> +已签名模块在末尾简单地附加了数字签名。模块文件末尾的字符串
>>> +``~Module signature appended~.`` 确认签名存在,但不能确认签名有效!
>>> +
>>> +已签名模块是脆弱的,因为签名在定义的ELF容器之外。
>>> +因此,一旦计算并附加签名,就不得剥离它们。
>>> +请注意,整个模块都是签名的有效载荷,包括签名时存在的任何和所有调试信息。
>>>
>>> +
>>> +
>>> +加载已签名模块
>>> +==============
>>> +
>>> +模块通过 insmod、modprobe、``init_module()`` 或 ``finit_module()`` 加
>>> 载,
>>> +与未签名模块完全一样,因为在用户空间中不进行任何处理。
>>> +所有签名检查都在内核内完成。
>>> +
>>> +
>>> +无效签名和未签名模块
>>> +====================
>>> +
>>> +如果启用了 ``CONFIG_MODULE_SIG_FORCE`` 或在内核启动命令提供了
>>> module.sig_enforce=1,
>>> +内核将仅加载具有有效签名且具有公钥的模块。
>>> +否则,它还将加载未签名的模块。
>>> +任何具有不匹配签名的模块将不被允许加载。
>>> +
>>> +任何具有不可解析签名的模块将被拒绝。
>>> +
>>> +
>>> +管理/保护私钥
>>> +==============
>>> +
>>> +由于私钥用于签名模块,病毒和恶意软件可以使用私钥签名模块并危害操作系统。
>>>
>>> +私钥必须被销毁或移动到安全位置,而不是保存在内核源代码树的根节点中。
>>> +
>>> +如果使用相同的私钥为多个内核配置签名模块,
>>> +必须确保模块版本信息足以防止将模块加载到不同的内核中。
>>> +要么设置 ``CONFIG_MODVERSIONS=y``,要么通过更改 ``EXTRAVERSION`` 或
>>> ``CONFIG_LOCALVERSION`` 确保每个配置具有不同的内核发布字符串。
>>> -- 2.43.0
>>>
>
^ permalink raw reply
* Re: [PATCH v3] docs/zh_CN: add module-signing Chinese translation
From: Dongliang Mu @ 2026-04-18 6:20 UTC (permalink / raw)
To: Yan Zhu, seakeel, alexs, si.yanteng, corbet
Cc: skhan, linux-doc, linux-kernel
In-Reply-To: <tencent_99B2EE128E02C6CC1120DE135D4A2DA5B309@qq.com>
On 4/18/26 12:45 PM, Yan Zhu wrote:
> Translate .../admin-guide/module-signing.rst into Chinese.
>
> Update the translation through commit 0ad9a71933e7
> ("modsign: Enable ML-DSA module signing")
>
> Signed-off-by: Yan Zhu <zhuyan2015@qq.com>
> ---
Hi Yan,
Please remember to add your changelog under the "---". For example,
v1->v2: XXX
And I wonder where the v2 patch is as I don't find it in my mbox. Do I
miss something?
Dongliang Mu
> .../zh_CN/admin-guide/module-signing.rst | 249 ++++++++++++++++++
> 1 file changed, 249 insertions(+)
> create mode 100644 Documentation/translations/zh_CN/admin-guide/module-signing.rst
>
> diff --git a/Documentation/translations/zh_CN/admin-guide/module-signing.rst b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
> new file mode 100644
> index 000000000000..04b0f1cbafd5
> --- /dev/null
> +++ b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
> @@ -0,0 +1,249 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: ../disclaimer-zh_CN.rst
> +
> +:Original: Documentation/admin-guide/module-signing.rst
> +:翻译:
> + 朱岩 Yan Zhu <zhuyan2015@qq.com>
> +
> +
> +==========================
> +内核模块签名机制
> +==========================
> +
> +.. 目录
> +..
> +.. - 概述
> +.. - 配置模块签名
> +.. - 生成签名密钥
> +.. - 内核中的公钥
> +.. - 模块手动签名
> +.. - 已签名模块和剥离
> +.. - 加载已签名模块
> +.. - 无效签名和未签名模块
> +.. - 管理/保护私钥
> +
> +
> +概述
> +====
> +
> +内核模块签名机制在安装过程中对模块进行加密签名,然后在加载模块时检查签名。这
> +通过禁止加载未签名的模块或使用无效密钥签名的模块来提高内核安全性。模块签名通
> +过使恶意模块更难加载到内核中来增加安全性。模块签名检查在内核中完成,因此不需
> +要受信任的用户空间位。
> +
> +此机制使用 X.509 ITU-T 标准证书对涉及的公钥进行编码。签名本身不以任何工业标准
> +类型编码。内置机制目前仅支持 RSA、NIST P-384 ECDSA 和 NIST FIPS-204 ML-DSA
> +公钥签名标准(尽管它是可插拔的并允许使用其他标准)。对于 RSA 和 ECDSA,可以使
> +用的可能的哈希算法是大小为 256、384 和 512 的 SHA-2 和 SHA-3(算法由签名中的
> +数据选择);ML-DSA会自行进行哈希运算,但允许与SHA512哈希算法结合用于签名属性。
> +
> +配置模块签名
> +============
> +
> +通过进入内核配置的 :menuselection:`Enable Loadable Module Support` 菜单并打
> +开以下选项来启用模块签名机制::
> +
> + CONFIG_MODULE_SIG "Module signature verification"
> +
> +这有多个可用选项:
> +
> + (1) :menuselection:`Require modules to be validly signed`
> + (``CONFIG_MODULE_SIG_FORCE``)
> +
> + 这指定了内核应如何处理其密钥未知或未签名的模块。
> +
> + 如果关闭(即"宽松模式"),则允许使用不可用密钥和未签名的模块,但内核将被
> + 标记为受污染,并且相关模块将被标记为受污染,显示字符'E'。
> +
> + 如果打开(即"限制模式"),只有具有有效签名且可由内核拥有的公钥验证的模块
> + 才会被加载。所有其他模块将生成错误。
> +
> + 无论此处的设置如何,如果模块的签名块无法解析,它将被直接拒绝。
> +
> +
> + (2) :menuselection:`Automatically sign all modules`
> + (``CONFIG_MODULE_SIG_ALL``)
> +
> + 如果打开此选项,则在构建的 modules_install 阶段期间将自动签名模块。
> + 如果关闭,则必须使用以下命令手动签名模块::
> +
> + scripts/sign-file
> +
> +
> + (3) :menuselection:`Which hash algorithm should modules be signed with?`
> +
> + 这提供了安装阶段将用于签名模块的哈希算法选择:
> +
> + =============================== ==========================================
> + ``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
> + ``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
> + ``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
> + ``CONFIG_MODULE_SIG_SHA3_256`` :menuselection:`Sign modules with SHA3-256`
> + ``CONFIG_MODULE_SIG_SHA3_384`` :menuselection:`Sign modules with SHA3-384`
> + ``CONFIG_MODULE_SIG_SHA3_512`` :menuselection:`Sign modules with SHA3-512`
> + =============================== ==========================================
> +
> + 此处选择的算法也将被构建到内核中(而不是作为模块),以便使用该算法签名的
> + 模块可以在不导致循环依赖的情况下检查其签名。
> +
> +
> + (4) :menuselection:`File name or PKCS#11 URI of module signing key`
> + (``CONFIG_MODULE_SIG_KEY``)
> +
> + 将此选项设置为除默认值 ``certs/signing_key.pem`` 之外的其他值将禁用签名
> + 密钥的自动生成,并允许使用您选择的密钥对内核模块进行签名。提供的字符串应
> + 标识包含私钥及其对应的 PEM 格式 X.509 证书的文件,或者在 OpenSSL
> + ENGINE_pkcs11 功能正常的系统上,使用 RFC7512 定义的 PKCS#11 URI。在后一
> + 种情况下,PKCS#11 URI 应引用证书和私钥。
> +
> + 如果包含私钥的 PEM 文件已加密,或者 PKCS#11 令牌需要 PIN,可以通过
> + ``KBUILD_SIGN_PIN`` 变量在构建时提供。
> +
> +
> + (5) :menuselection:`Additional X.509 keys for default system keyring`
> + (``CONFIG_SYSTEM_TRUSTED_KEYS``)
> +
> + 此选项可设置为包含附加证书的 PEM 编码文件的文件名,这些证书将默认包含在
> + 系统密钥环中。
> +
> +请注意,启用模块签名会为内核构建过程添加对执行签名工具的OpenSSL开发包的依赖。
> +
> +
> +生成签名密钥
> +============
> +
> +生成和检查签名需要加密密钥对。私钥用于生成签名,相应的公钥用于检查签名。私钥
> +仅在构建期间需要,之后可以删除或安全存储。公钥被构建到内核中,以便在加载模块
> +时可以使用它来检查签名。
> +
> +在正常情况下,当 ``CONFIG_MODULE_SIG_KEY`` 保持默认值时,如果文件中不存在密
> +钥对,内核构建将使用 openssl 自动生成新的密钥对::
> +
> + certs/signing_key.pem
> +
> +在构建 vmlinux 期间(公钥需要构建到 vmlinux 中)使用参数::
> +
> + certs/x509.genkey
> +
> +文件(如果尚不存在也会生成)。
> +
> +可以在 RSA(``MODULE_SIG_KEY_TYPE_RSA``)、
> +ECDSA(``MODULE_SIG_KEY_TYPE_ECDSA``)和
> +ML-DSA(``MODULE_SIG_KEY_TYPE_MLDSA_*``)之间选择生成 RSA 4k、NIST P-384
> +密钥对或 ML-DSA 44、65 或 87 密钥对。
> +
> +强烈建议您提供自己的 x509.genkey 文件。
> +
> +最值得注意的是,在 x509.genkey 文件中,req_distinguished_name 部分应从默认值
> +更改::
> +
> + [ req_distinguished_name ]
> + #O = Unspecified company
> + CN = Build time autogenerated kernel key
> + #emailAddress = unspecified.user@unspecified.company
> +
> +生成的 RSA 密钥大小也可以通过以下方式设置::
> +
> + [ req ]
> + default_bits = 4096
> +
> +也可以使用位于 Linux 内核源代码树根节点中的 x509.genkey 密钥生成配置文件和
> +openssl 命令手动生成公钥/私钥文件。以下是生成公钥/私钥文件的示例::
> +
> + openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
> + -config x509.genkey -outform PEM -out kernel_key.pem \
> + -keyout kernel_key.pem
> +
> +然后可以将生成的 kernel_key.pem 文件的完整路径名指定在
> +``CONFIG_MODULE_SIG_KEY``选项中,并且将使用其中的证书和密钥而不是自动生成的
> +密钥对。
> +
> +
> +内核中的公钥
> +============
> +
> +内核包含一个可由 root 查看的公钥环。它们在名为 ".builtin_trusted_keys" 的密
> +钥环中,可以通过以下方式查看::
> +
> + [root@deneb ~]# cat /proc/keys
> + ...
> + 223c7853 I------ 1 perm 1f030000 0 0 keyring .builtin_trusted_keys: 1
> + 302d2d52 I------ 1 perm 1f010000 0 0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 []
> +
> +除了专门为模块签名生成的公钥外,还可以在 ``CONFIG_SYSTEM_TRUSTED_KEYS`` 配置
> +选项引用的 PEM 编码文件中提供其他受信任的证书。
> +
> +此外,架构代码可以从硬件存储中获取公钥并将其添加(例如从 UEFI 密钥数据库)。
> +
> +最后,可以通过以下方式添加其他公钥::
> +
> + keyctl padd asymmetric "" [.builtin_trusted_keys-ID] <[key-file]
> +
> +例如::
> +
> + keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
> +
> +但是,请注意,内核只允许将由已驻留在 ``.builtin_trusted_keys`` 中的密钥有效
> +签名的密钥添加到 ``.builtin_trusted_keys``。
> +
> +模块手动签名
> +============
> +
> +要手动对模块进行签名,请使用 Linux 内核源代码树中可用的 scripts/sign-file 工
> +具。该脚本需要 4 个参数:
> +
> + 1. 哈希算法(例如,sha256)
> + 2. 私钥文件名或 PKCS#11 URI
> + 3. 公钥文件名
> + 4. 要签名的内核模块
> +
> +以下是签名内核模块的示例::
> +
> + scripts/sign-file sha512 kernel-signkey.priv \
> + kernel-signkey.x509 module.ko
> +
> +使用的哈希算法不必与配置的算法匹配,但如果不同,应确保哈希算法要么内置在内核
> +中,要么可以在不需要自身的情况下加载。
> +
> +如果私钥需要密码或 PIN,可以在 $KBUILD_SIGN_PIN 环境变量中提供。
> +
> +
> +已签名模块和剥离
> +================
> +
> +已签名模块在末尾简单地附加了数字签名。模块文件末尾的字符串
> +``~Module signature appended~.`` 确认签名存在,但不能确认签名有效!
> +
> +已签名模块是脆弱的,因为签名在定义的ELF容器之外。因此,一旦计算并附加签名,就
> +不得剥离它们。请注意,整个模块都是签名的有效载荷,包括签名时存在的任何和所有
> +调试信息。
> +
> +
> +加载已签名模块
> +==============
> +
> +模块通过 insmod、modprobe、``init_module()`` 或 ``finit_module()`` 加载,
> +与未签名模块完全一样,因为在用户空间中不进行任何处理。
> +所有签名检查都在内核内完成。
> +
> +
> +无效签名和未签名模块
> +====================
> +
> +如果启用了 ``CONFIG_MODULE_SIG_FORCE`` 或在内核启动命令提供了
> +module.sig_enforce=1,内核将仅加载具有有效签名且具有公钥的模块。否则,它还将
> +加载未签名的模块。任何具有不匹配签名的模块将不被允许加载。
> +
> +任何具有不可解析签名的模块将被拒绝。
> +
> +
> +管理/保护私钥
> +==============
> +
> +由于私钥用于签名模块,病毒和恶意软件可以使用私钥签名模块并危害操作系统。私钥
> +必须被销毁或移动到安全位置,而不是保存在内核源代码树的根节点中。
> +
> +如果使用相同的私钥为多个内核配置签名模块,必须确保模块版本信息足以防止将模块
> +加载到不同的内核中。要么设置 ``CONFIG_MODVERSIONS=y``,要么通过更改
> +``EXTRAVERSION`` 或 ``CONFIG_LOCALVERSION`` 确保每个配置具有不同的内核发布字
> +符串。
^ permalink raw reply
* Re: [PATCH] docs: Add overview and SLUB allocator sections to slab documentation
From: Harry Yoo (Oracle) @ 2026-04-18 6:19 UTC (permalink / raw)
To: Nick Huang
Cc: Vlastimil Babka, Andrew Morton, David Hildenbrand,
Jonathan Corbet, Hao Li, Christoph Lameter, David Rientjes,
Roman Gushchin, Lorenzo Stoakes, Liam R . Howlett, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-mm, linux-doc,
linux-kernel
In-Reply-To: <20260418000635.17499-1-sef1548@gmail.com>
Hi, Nick.
On Sat, Apr 18, 2026 at 12:06:19AM +0000, Nick Huang wrote:
> - Add "Overview" section explaining the slab allocator's role and purpose
> - Document the three main slab allocator implementations (SLAB, SLUB, SLOB)
> - Highlight SLUB as the default allocator on modern systems
> - Add "SLUB Allocator" subsection with detailed information:
> - Explain SLUB's design goals and advantages over legacy SLAB
> - Document its focus on simplification and performance
> - Note support for both uniprocessor and SMP systems
>
> Signed-off-by: Nick Huang <sef1548@gmail.com>
> --0
In case this was assisted by AI or other tools, please disclose that
according to the process document:
https://docs.kernel.org/process/generated-content.html
https://docs.kernel.org/process/coding-assistants.html
Mentioning because some people using tools to develop/document
the kernel are not aware that they need to disclose the fact.
It's better to remind people.
Tools can help only if, you, as a person, understand the design and
tradeoff behind it. I don't think it'd be useful to spend time on
improving document if this is driven by AI tools or something,
not you.
If you're using AI or not, the fact that you're not following recent
changes of the slab allocator worries me a little bit, to be honest.
Well-written design documentation is hard to come by without
understanding the current design.
If you're still willing to improve the doc, please do more research.
> Documentation/mm/slab.rst | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/Documentation/mm/slab.rst b/Documentation/mm/slab.rst
> index 2bcc58ada302..2d1d093afb7b 100644
> --- a/Documentation/mm/slab.rst
> +++ b/Documentation/mm/slab.rst
> @@ -4,6 +4,32 @@
> +SLUB Allocator
> +==============
> +
> +Overview
> +--------
> +
> +SLUB is a slab allocator designed to replace the legacy SLAB allocator
As Matthew mentioned, SLAB and SLOB were gone.
Also, it isn't really the traditional SLUB we have had anymore.
> +(mm/slab.c). It addresses the complexity, scalability limitations, and
> +memory overhead of the SLAB implementation.
Stating "Compared to SLAB, SLUB addresses X, Y, and Z" is
worth adding only if with a reason explaining why.
If you think SLUB addresses certain problems of SLAB, please explain
what problems are and why they exist, and how SLUB addresses that.
Please make more solid arguments with explanations to support.
> +The primary goal of SLUB is to simplify slab allocation while improving
I don't think traditional SLUB was particularily simple compared
to SLAB, to be honest. It was probably simple at the beginning,
but then more and more complexity has been added later.
> +performance on both uniprocessor (UP) and symmetric multiprocessing (SMP)
> +systems.
SLAB also supported UP and had features to provide scalability on
SMP. Why do you think SLUB particularly improved performance on
them?
> +
> Functions and structures
> ========================
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply
* Re: [PATCH] docs: Add overview and SLUB allocator sections to slab documentation
From: Nick Huang @ 2026-04-18 6:12 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Vlastimil Babka, Harry Yoo, Andrew Morton, David Hildenbrand,
Jonathan Corbet, Hao Li, Christoph Lameter, David Rientjes,
Roman Gushchin, Lorenzo Stoakes, Liam R . Howlett, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-mm, linux-doc,
linux-kernel
In-Reply-To: <CABZAGREDHwsXMm65_WxEn=v-zTv7_eVqQzQeCRqU2Gyc0aTETQ@mail.gmail.com>
Nick Huang <sef1548@gmail.com> 於 2026年4月18日週六 下午1:27寫道:
>
> Matthew Wilcox <willy@infradead.org> 於 2026年4月18日週六 下午1:04寫道:
> >
> > On Sat, Apr 18, 2026 at 12:06:19AM +0000, Nick Huang wrote:
> > > - Add "Overview" section explaining the slab allocator's role and purpose
> > > - Document the three main slab allocator implementations (SLAB, SLUB, SLOB)
Hi Matthew Wilcox
I will remove this sentence in the next version:
“Document the three main slab allocator implementations (SLAB, SLUB, SLOB).”
I’m not entirely sure I fully understand your point. If I’ve missed
anything, please let me know what needs to be changed. Thank you.
> > Umm.
> >
> > commit 6630e950d532
> > Author: Vlastimil Babka <vbabka@kernel.org>
> > Date: Tue Feb 28 15:38:07 2023 +0100
> >
> > mm/slob: remove slob.c
> >
> > commit 16a1d968358a
> > Author: Vlastimil Babka <vbabka@kernel.org>
> > Date: Mon Oct 2 20:43:43 2023 +0200
> >
> > mm/slab: remove mm/slab.c and slab_def.h
> >
> > Care to revise?
> >
> Hi Matthew Wilcox
>
> Thanks for pointing this out. You are absolutely right—I overlooked
> the fact that SLAB and SLOB have been removed from the kernel.
> I will remove those sections and ensure the documentation focuses on
> SLUB for the v2 submission. Thanks for the correction.
>
> > > - Highlight SLUB as the default allocator on modern systems
> > > - Add "SLUB Allocator" subsection with detailed information:
> > > - Explain SLUB's design goals and advantages over legacy SLAB
> > > - Document its focus on simplification and performance
> > > - Note support for both uniprocessor and SMP systems
> > >
> > > Signed-off-by: Nick Huang <sef1548@gmail.com>
> > > ---
> > > Documentation/mm/slab.rst | 26 ++++++++++++++++++++++++++
> > > 1 file changed, 26 insertions(+)
> > >
> > > diff --git a/Documentation/mm/slab.rst b/Documentation/mm/slab.rst
> > > index 2bcc58ada302..2d1d093afb7b 100644
> > > --- a/Documentation/mm/slab.rst
> > > +++ b/Documentation/mm/slab.rst
> > > @@ -4,6 +4,32 @@
> > > Slab Allocation
> > > ===============
> > >
> > > +Overview
> > > +========
> > > +
> > > +The slab allocator is responsible for efficient allocation and reuse of
> > > +small kernel objects. It reduces internal fragmentation and improves
> > > +performance by caching frequently used objects.
> > > +
> > > +The Linux kernel provides multiple slab allocator implementations,
> > > +including SLAB, SLUB, and SLOB. Among these, SLUB is the default
> > > +allocator on most modern systems.
> > > +
> > > +SLUB Allocator
> > > +==============
> > > +
> > > +Overview
> > > +--------
> > > +
> > > +SLUB is a slab allocator designed to replace the legacy SLAB allocator
> > > +(mm/slab.c). It addresses the complexity, scalability limitations, and
> > > +memory overhead of the SLAB implementation.
> > > +
> > > +The primary goal of SLUB is to simplify slab allocation while improving
> > > +performance on both uniprocessor (UP) and symmetric multiprocessing (SMP)
> > > +systems.
> > > +
> > > +
> > > Functions and structures
> > > ========================
> > >
> > > --
> > > 2.43.0
> > >
> > >
>
> --
> Regards,
> Nick Huang
--
Regards,
Nick Huang
^ permalink raw reply
* Re: [PATCH] docs/zh_CN: add module-signing Chinese translation
From: Yan Zhu @ 2026-04-18 5:34 UTC (permalink / raw)
To: Alex Shi, alexs, si.yanteng, corbet; +Cc: dzm91, skhan, linux-doc, linux-kernel
In-Reply-To: <bb119ab9-a805-46c0-91ab-1f45f7a506c8@gmail.com>
On 4/10/2026 7:21 PM, Alex Shi wrote:
>
>
> On 2026/4/1 23:40, Yan Zhu wrote:
>> Translate .../admin-guide/module-signing.rst into Chinese.
>>
>> Update the translation through commit 0ad9a71933e7
>> ("modsign: Enable ML-DSA module signing")
>>
>> Signed-off-by: Yan Zhu<zhuyan2015@qq.com>
>> ---
>> .../zh_CN/admin-guide/module-signing.rst | 242 ++++++++++++++++++
>> 1 file changed, 242 insertions(+)
>> create mode 100644 Documentation/translations/zh_CN/admin-guide/
>> module-signing.rst
>>
>> diff --git a/Documentation/translations/zh_CN/admin-guide/module-
>> signing.rst b/Documentation/translations/zh_CN/admin-guide/module-
>> signing.rst
>> new file mode 100644
>> index 000000000000..b8c209dd229d
>> --- /dev/null
>> +++ b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
>> @@ -0,0 +1,242 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +.. include:: ../disclaimer-zh_CN.rst
>> +
>> +:Original: Documentation/admin-guide/module-signing.rst
>> +:翻译:
>> + 朱岩 Yan Zhu<zhuyan2015@qq.com>
>> +
>> +
>> +==========================
>> +内核模块签名机制
>> +==========================
>> +
>> +.. 目录
>> +..
>> +.. - 概述
>> +.. - 配置模块签名
>> +.. - 生成签名密钥
>> +.. - 内核中的公钥
>> +.. - 模块手动签名
>> +.. - 已签名模块和剥离
>> +.. - 加载已签名模块
>> +.. - 无效签名和未签名模块
>> +.. - 管理/保护私钥
>> +
>> +
>> +概述
>> +====
>> +
>> +内核模块签名机制在安装过程中对模块进行加密签名,然后在加载模块时检查
>> 签名。
>> +这通过禁止加载未签名的模块或使用无效密钥签名的模块来提高内核安全性。
>> +模块签名通过使恶意模块更难加载到内核中来增加安全性。
>> +模块签名检查在内核中完成,因此不需要受信任的用户空间位。
>> +
>> +此机制使用 X.509 ITU-T 标准证书对涉及的公钥进行编码。
>> +签名本身不以任何工业标准类型编码。
>> +内置机制目前仅支持 RSA、NIST P-384 ECDSA 和 NIST FIPS-204 ML-DSA 公钥
>> 签名标准(尽管它是可插拔的并允许使用其他标准)。
>
> This line is too long, and not align with other lines. you need to
> follow the coding style in documents.
>
>
>> +对于 RSA 和 ECDSA,可以使用的可能的哈希算法是大小为 256、384 和 512
>> 的 SHA-2 和 SHA-3(算法由签名中的数据选择);
>> +ML-DSA会自行进行哈希运算,但允许与SHA512哈希算法结合用于签名属性。
>> +
>> +配置模块签名
>> +============
>> +
>> +通过进入内核配置的 :menuselection:`Enable Loadable Module Support` 菜
>> 单并打开以下选项来启用模块签名机制::
>> +
>> + CONFIG_MODULE_SIG "Module signature verification"
>> +
>> +这有多个可用选项:
>> +
>> + (1) :menuselection:`Require modules to be validly signed`
>> + (``CONFIG_MODULE_SIG_FORCE``)
>> +
>> + 这指定了内核应如何处理其密钥未知或未签名的模块。
>> +
>> + 如果关闭(即"宽松模式"),则允许使用不可用密钥和未签名的模块,
>> + 但内核将被标记为受污染,并且相关模块将被标记为受污染,显示字符'E'。
>> +
>> + 如果打开(即"限制模式"),只有具有有效签名且可由内核拥有的公钥验
>> 证的模块才会被加载。
>> + 所有其他模块将生成错误。
>> +
>> + 无论此处的设置如何,如果模块的签名块无法解析,它将被直接拒绝。
>> +
>> +
>> + (2) :menuselection:`Automatically sign all modules`
>> + (``CONFIG_MODULE_SIG_ALL``)
>> +
>> + 如果打开此选项,则在构建的 modules_install 阶段期间将自动签名模块。
>> + 如果关闭,则必须使用以下命令手动签名模块::
>> +
>> + scripts/sign-file
>> +
>> +
>> + (3) :menuselection:`Which hash algorithm should modules be signed
>> with?`
>> +
>> + 这提供了安装阶段将用于签名模块的哈希算法选择:
>> +
>> + ===============================
>> ==========================================
>> + ``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with
>> SHA-256`
>> + ``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with
>> SHA-384`
>> + ``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with
>> SHA-512`
>> + ``CONFIG_MODULE_SIG_SHA3_256`` :menuselection:`Sign modules
>> with SHA3-256`
>> + ``CONFIG_MODULE_SIG_SHA3_384`` :menuselection:`Sign modules
>> with SHA3-384`
>> + ``CONFIG_MODULE_SIG_SHA3_512`` :menuselection:`Sign modules
>> with SHA3-512`
>> + ===============================
>> ==========================================
>
> Got errors here:
> Applying: docs/zh_CN: add module-signing Chinese translation
> /home/alexshi/linuxdoc/.git/rebase-apply/patch:87: indent with spaces.
> ===============================
> ==========================================
> /home/alexshi/linuxdoc/.git/rebase-apply/patch:94: indent with spaces.
> ===============================
> ==========================================
> warning: 2 lines add whitespace errors.>
> Thanks
> Alex
Thanks for your review, I have fixed those issues in v3 patch.
>> +
>> + 此处选择的算法也将被构建到内核中(而不是作为模块),
>> + 以便使用该算法签名的模块可以在不导致循环依赖的情况下检查其签名。
>> +
>> +
>> + (4) :menuselection:`File name or PKCS#11 URI of module signing key`
>> + (``CONFIG_MODULE_SIG_KEY``)
>> +
>> + 将此选项设置为除默认值 ``certs/signing_key.pem`` 之外的其他值将
>> 禁用签名密钥的自动生成,
>> + 并允许使用您选择的密钥对内核模块进行签名。
>> + 提供的字符串应标识包含私钥及其对应的 PEM 格式 X.509 证书的文件,
>> + 或者在 OpenSSL ENGINE_pkcs11 功能正常的系统上,使用 RFC7512 定义
>> 的 PKCS#11 URI。
>> + 在后一种情况下,PKCS#11 URI 应引用证书和私钥。
>> +
>> + 如果包含私钥的 PEM 文件已加密,或者 PKCS#11 令牌需要 PIN,
>> + 可以通过 ``KBUILD_SIGN_PIN`` 变量在构建时提供。
>> +
>> +
>> + (5) :menuselection:`Additional X.509 keys for default system keyring`
>> + (``CONFIG_SYSTEM_TRUSTED_KEYS``)
>> +
>> + 此选项可设置为包含附加证书的 PEM 编码文件的文件名,
>> + 这些证书将默认包含在系统密钥环中。
>> +
>> +请注意,启用模块签名会为内核构建过程添加对执行签名工具的 OpenSSL 开发
>> 包的依赖。
>> +
>> +
>> +生成签名密钥
>> +============
>> +
>> +生成和检查签名需要加密密钥对。私钥用于生成签名,相应的公钥用于检查签名。
>> +私钥仅在构建期间需要,之后可以删除或安全存储。
>> +公钥被构建到内核中,以便在加载模块时可以使用它来检查签名。
>> +
>> +在正常情况下,当 ``CONFIG_MODULE_SIG_KEY`` 保持默认值时,
>> +如果文件中不存在密钥对,内核构建将使用 openssl 自动生成新的密钥对::
>> +
>> + certs/signing_key.pem
>> +
>> +在构建 vmlinux 期间(公钥需要构建到 vmlinux 中)使用参数::
>> +
>> + certs/x509.genkey
>> +
>> +文件(如果尚不存在也会生成)。
>> +
>> +可以在 RSA(``MODULE_SIG_KEY_TYPE_RSA``)、ECDSA
>> (``MODULE_SIG_KEY_TYPE_ECDSA``)
>> +和 ML-DSA(``MODULE_SIG_KEY_TYPE_MLDSA_*``)之间选择生成 RSA 4k、NIST
>> P-384 密钥对或 ML-DSA 44、65 或 87 密钥对。
>> +
>> +强烈建议您提供自己的 x509.genkey 文件。
>> +
>> +最值得注意的是,在 x509.genkey 文件中,req_distinguished_name 部分应
>> 从默认值更改::
>> +
>> + [ req_distinguished_name ]
>> + #O = Unspecified company
>> + CN = Build time autogenerated kernel key
>> + #emailAddress =unspecified.user@unspecified.company
>> +
>> +生成的 RSA 密钥大小也可以通过以下方式设置::
>> +
>> + [ req ]
>> + default_bits = 4096
>> +
>> +也可以使用位于 Linux 内核源代码树根节点中的 x509.genkey 密钥生成配置
>> 文件和 openssl 命令手动生成公钥/私钥文件。
>> +以下是生成公钥/私钥文件的示例::
>> +
>> + openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
>> + -config x509.genkey -outform PEM -out kernel_key.pem \
>> + -keyout kernel_key.pem
>> +
>> +然后可以将生成的 kernel_key.pem 文件的完整路径名指定在
>> ``CONFIG_MODULE_SIG_KEY`` 选项中,
>> +并且将使用其中的证书和密钥而不是自动生成的密钥对。
>> +
>> +
>> +内核中的公钥
>> +============
>> +
>> +内核包含一个可由 root 查看的公钥环。它们在名为
>> ".builtin_trusted_keys" 的密钥环中,
>> +可以通过以下方式查看::
>> +
>> + [root@deneb ~]# cat /proc/keys
>> + ...
>> + 223c7853 I------ 1 perm 1f030000 0 0
>> keyring .builtin_trusted_keys: 1
>> + 302d2d52 I------ 1 perm 1f010000 0 0 asymmetri Fedora
>> kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA
>> a7118079 []
>> +
>> +除了专门为模块签名生成的公钥外,还可以在
>> ``CONFIG_SYSTEM_TRUSTED_KEYS`` 配置选项引用的 PEM 编码文件中提供其他受
>> 信任的证书。
>> +
>> +此外,架构代码可以从硬件存储中获取公钥并将其添加(例如从 UEFI 密钥数
>> 据库)。
>> +
>> +最后,可以通过以下方式添加其他公钥::
>> +
>> + keyctl padd asymmetric "" [.builtin_trusted_keys-ID] <[key-file]
>> +
>> +例如::
>> +
>> + keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
>> +
>> +但是,请注意,内核只允许将由已驻留在 ``.builtin_trusted_keys`` 中的密
>> 钥有效签名的密钥添加到 ``.builtin_trusted_keys``。
>> +
>> +模块手动签名
>> +============
>> +
>> +要手动对模块进行签名,请使用 Linux 内核源代码树中可用的 scripts/sign-
>> file 工具。
>> +该脚本需要 4 个参数:
>> +
>> + 1. 哈希算法(例如,sha256)
>> + 2. 私钥文件名或 PKCS#11 URI
>> + 3. 公钥文件名
>> + 4. 要签名的内核模块
>> +
>> +以下是签名内核模块的示例::
>> +
>> + scripts/sign-file sha512 kernel-signkey.priv \
>> + kernel-signkey.x509 module.ko
>> +
>> +使用的哈希算法不必与配置的算法匹配,但如果不同,
>> +应确保哈希算法要么内置在内核中,要么可以在不需要自身的情况下加载。
>> +
>> +如果私钥需要密码或 PIN,可以在 $KBUILD_SIGN_PIN 环境变量中提供。
>> +
>> +
>> +已签名模块和剥离
>> +================
>> +
>> +已签名模块在末尾简单地附加了数字签名。模块文件末尾的字符串
>> +``~Module signature appended~.`` 确认签名存在,但不能确认签名有效!
>> +
>> +已签名模块是脆弱的,因为签名在定义的ELF容器之外。
>> +因此,一旦计算并附加签名,就不得剥离它们。
>> +请注意,整个模块都是签名的有效载荷,包括签名时存在的任何和所有调试信息。
>> +
>> +
>> +加载已签名模块
>> +==============
>> +
>> +模块通过 insmod、modprobe、``init_module()`` 或 ``finit_module()`` 加
>> 载,
>> +与未签名模块完全一样,因为在用户空间中不进行任何处理。
>> +所有签名检查都在内核内完成。
>> +
>> +
>> +无效签名和未签名模块
>> +====================
>> +
>> +如果启用了 ``CONFIG_MODULE_SIG_FORCE`` 或在内核启动命令提供了
>> module.sig_enforce=1,
>> +内核将仅加载具有有效签名且具有公钥的模块。
>> +否则,它还将加载未签名的模块。
>> +任何具有不匹配签名的模块将不被允许加载。
>> +
>> +任何具有不可解析签名的模块将被拒绝。
>> +
>> +
>> +管理/保护私钥
>> +==============
>> +
>> +由于私钥用于签名模块,病毒和恶意软件可以使用私钥签名模块并危害操作系统。
>> +私钥必须被销毁或移动到安全位置,而不是保存在内核源代码树的根节点中。
>> +
>> +如果使用相同的私钥为多个内核配置签名模块,
>> +必须确保模块版本信息足以防止将模块加载到不同的内核中。
>> +要么设置 ``CONFIG_MODVERSIONS=y``,要么通过更改 ``EXTRAVERSION`` 或
>> ``CONFIG_LOCALVERSION`` 确保每个配置具有不同的内核发布字符串。
>> -- 2.43.0
>>
--
Yan Zhu
^ permalink raw reply
* Re: [PATCH] docs: Add overview and SLUB allocator sections to slab documentation
From: Nick Huang @ 2026-04-18 5:27 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Vlastimil Babka, Harry Yoo, Andrew Morton, David Hildenbrand,
Jonathan Corbet, Hao Li, Christoph Lameter, David Rientjes,
Roman Gushchin, Lorenzo Stoakes, Liam R . Howlett, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-mm, linux-doc,
linux-kernel
In-Reply-To: <aeMQ36jFnCKmCSyA@casper.infradead.org>
Matthew Wilcox <willy@infradead.org> 於 2026年4月18日週六 下午1:04寫道:
>
> On Sat, Apr 18, 2026 at 12:06:19AM +0000, Nick Huang wrote:
> > - Add "Overview" section explaining the slab allocator's role and purpose
> > - Document the three main slab allocator implementations (SLAB, SLUB, SLOB)
>
> Umm.
>
> commit 6630e950d532
> Author: Vlastimil Babka <vbabka@kernel.org>
> Date: Tue Feb 28 15:38:07 2023 +0100
>
> mm/slob: remove slob.c
>
> commit 16a1d968358a
> Author: Vlastimil Babka <vbabka@kernel.org>
> Date: Mon Oct 2 20:43:43 2023 +0200
>
> mm/slab: remove mm/slab.c and slab_def.h
>
> Care to revise?
>
Hi Matthew Wilcox
Thanks for pointing this out. You are absolutely right—I overlooked
the fact that SLAB and SLOB have been removed from the kernel.
I will remove those sections and ensure the documentation focuses on
SLUB for the v2 submission. Thanks for the correction.
> > - Highlight SLUB as the default allocator on modern systems
> > - Add "SLUB Allocator" subsection with detailed information:
> > - Explain SLUB's design goals and advantages over legacy SLAB
> > - Document its focus on simplification and performance
> > - Note support for both uniprocessor and SMP systems
> >
> > Signed-off-by: Nick Huang <sef1548@gmail.com>
> > ---
> > Documentation/mm/slab.rst | 26 ++++++++++++++++++++++++++
> > 1 file changed, 26 insertions(+)
> >
> > diff --git a/Documentation/mm/slab.rst b/Documentation/mm/slab.rst
> > index 2bcc58ada302..2d1d093afb7b 100644
> > --- a/Documentation/mm/slab.rst
> > +++ b/Documentation/mm/slab.rst
> > @@ -4,6 +4,32 @@
> > Slab Allocation
> > ===============
> >
> > +Overview
> > +========
> > +
> > +The slab allocator is responsible for efficient allocation and reuse of
> > +small kernel objects. It reduces internal fragmentation and improves
> > +performance by caching frequently used objects.
> > +
> > +The Linux kernel provides multiple slab allocator implementations,
> > +including SLAB, SLUB, and SLOB. Among these, SLUB is the default
> > +allocator on most modern systems.
> > +
> > +SLUB Allocator
> > +==============
> > +
> > +Overview
> > +--------
> > +
> > +SLUB is a slab allocator designed to replace the legacy SLAB allocator
> > +(mm/slab.c). It addresses the complexity, scalability limitations, and
> > +memory overhead of the SLAB implementation.
> > +
> > +The primary goal of SLUB is to simplify slab allocation while improving
> > +performance on both uniprocessor (UP) and symmetric multiprocessing (SMP)
> > +systems.
> > +
> > +
> > Functions and structures
> > ========================
> >
> > --
> > 2.43.0
> >
> >
--
Regards,
Nick Huang
^ permalink raw reply
* Re: [PATCH] docs: Add overview and SLUB allocator sections to slab documentation
From: Matthew Wilcox @ 2026-04-18 5:04 UTC (permalink / raw)
To: Nick Huang
Cc: Vlastimil Babka, Harry Yoo, Andrew Morton, David Hildenbrand,
Jonathan Corbet, Hao Li, Christoph Lameter, David Rientjes,
Roman Gushchin, Lorenzo Stoakes, Liam R . Howlett, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-mm, linux-doc,
linux-kernel
In-Reply-To: <20260418000635.17499-1-sef1548@gmail.com>
On Sat, Apr 18, 2026 at 12:06:19AM +0000, Nick Huang wrote:
> - Add "Overview" section explaining the slab allocator's role and purpose
> - Document the three main slab allocator implementations (SLAB, SLUB, SLOB)
Umm.
commit 6630e950d532
Author: Vlastimil Babka <vbabka@kernel.org>
Date: Tue Feb 28 15:38:07 2023 +0100
mm/slob: remove slob.c
commit 16a1d968358a
Author: Vlastimil Babka <vbabka@kernel.org>
Date: Mon Oct 2 20:43:43 2023 +0200
mm/slab: remove mm/slab.c and slab_def.h
Care to revise?
> - Highlight SLUB as the default allocator on modern systems
> - Add "SLUB Allocator" subsection with detailed information:
> - Explain SLUB's design goals and advantages over legacy SLAB
> - Document its focus on simplification and performance
> - Note support for both uniprocessor and SMP systems
>
> Signed-off-by: Nick Huang <sef1548@gmail.com>
> ---
> Documentation/mm/slab.rst | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/Documentation/mm/slab.rst b/Documentation/mm/slab.rst
> index 2bcc58ada302..2d1d093afb7b 100644
> --- a/Documentation/mm/slab.rst
> +++ b/Documentation/mm/slab.rst
> @@ -4,6 +4,32 @@
> Slab Allocation
> ===============
>
> +Overview
> +========
> +
> +The slab allocator is responsible for efficient allocation and reuse of
> +small kernel objects. It reduces internal fragmentation and improves
> +performance by caching frequently used objects.
> +
> +The Linux kernel provides multiple slab allocator implementations,
> +including SLAB, SLUB, and SLOB. Among these, SLUB is the default
> +allocator on most modern systems.
> +
> +SLUB Allocator
> +==============
> +
> +Overview
> +--------
> +
> +SLUB is a slab allocator designed to replace the legacy SLAB allocator
> +(mm/slab.c). It addresses the complexity, scalability limitations, and
> +memory overhead of the SLAB implementation.
> +
> +The primary goal of SLUB is to simplify slab allocation while improving
> +performance on both uniprocessor (UP) and symmetric multiprocessing (SMP)
> +systems.
> +
> +
> Functions and structures
> ========================
>
> --
> 2.43.0
>
>
^ permalink raw reply
* [PATCH v3] docs/zh_CN: add module-signing Chinese translation
From: Yan Zhu @ 2026-04-18 4:45 UTC (permalink / raw)
To: seakeel, alexs, si.yanteng, corbet
Cc: dzm91, skhan, linux-doc, linux-kernel, zhuyan2015
Translate .../admin-guide/module-signing.rst into Chinese.
Update the translation through commit 0ad9a71933e7
("modsign: Enable ML-DSA module signing")
Signed-off-by: Yan Zhu <zhuyan2015@qq.com>
---
.../zh_CN/admin-guide/module-signing.rst | 249 ++++++++++++++++++
1 file changed, 249 insertions(+)
create mode 100644 Documentation/translations/zh_CN/admin-guide/module-signing.rst
diff --git a/Documentation/translations/zh_CN/admin-guide/module-signing.rst b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
new file mode 100644
index 000000000000..04b0f1cbafd5
--- /dev/null
+++ b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
@@ -0,0 +1,249 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/admin-guide/module-signing.rst
+:翻译:
+ 朱岩 Yan Zhu <zhuyan2015@qq.com>
+
+
+==========================
+内核模块签名机制
+==========================
+
+.. 目录
+..
+.. - 概述
+.. - 配置模块签名
+.. - 生成签名密钥
+.. - 内核中的公钥
+.. - 模块手动签名
+.. - 已签名模块和剥离
+.. - 加载已签名模块
+.. - 无效签名和未签名模块
+.. - 管理/保护私钥
+
+
+概述
+====
+
+内核模块签名机制在安装过程中对模块进行加密签名,然后在加载模块时检查签名。这
+通过禁止加载未签名的模块或使用无效密钥签名的模块来提高内核安全性。模块签名通
+过使恶意模块更难加载到内核中来增加安全性。模块签名检查在内核中完成,因此不需
+要受信任的用户空间位。
+
+此机制使用 X.509 ITU-T 标准证书对涉及的公钥进行编码。签名本身不以任何工业标准
+类型编码。内置机制目前仅支持 RSA、NIST P-384 ECDSA 和 NIST FIPS-204 ML-DSA
+公钥签名标准(尽管它是可插拔的并允许使用其他标准)。对于 RSA 和 ECDSA,可以使
+用的可能的哈希算法是大小为 256、384 和 512 的 SHA-2 和 SHA-3(算法由签名中的
+数据选择);ML-DSA会自行进行哈希运算,但允许与SHA512哈希算法结合用于签名属性。
+
+配置模块签名
+============
+
+通过进入内核配置的 :menuselection:`Enable Loadable Module Support` 菜单并打
+开以下选项来启用模块签名机制::
+
+ CONFIG_MODULE_SIG "Module signature verification"
+
+这有多个可用选项:
+
+ (1) :menuselection:`Require modules to be validly signed`
+ (``CONFIG_MODULE_SIG_FORCE``)
+
+ 这指定了内核应如何处理其密钥未知或未签名的模块。
+
+ 如果关闭(即"宽松模式"),则允许使用不可用密钥和未签名的模块,但内核将被
+ 标记为受污染,并且相关模块将被标记为受污染,显示字符'E'。
+
+ 如果打开(即"限制模式"),只有具有有效签名且可由内核拥有的公钥验证的模块
+ 才会被加载。所有其他模块将生成错误。
+
+ 无论此处的设置如何,如果模块的签名块无法解析,它将被直接拒绝。
+
+
+ (2) :menuselection:`Automatically sign all modules`
+ (``CONFIG_MODULE_SIG_ALL``)
+
+ 如果打开此选项,则在构建的 modules_install 阶段期间将自动签名模块。
+ 如果关闭,则必须使用以下命令手动签名模块::
+
+ scripts/sign-file
+
+
+ (3) :menuselection:`Which hash algorithm should modules be signed with?`
+
+ 这提供了安装阶段将用于签名模块的哈希算法选择:
+
+ =============================== ==========================================
+ ``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
+ ``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
+ ``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
+ ``CONFIG_MODULE_SIG_SHA3_256`` :menuselection:`Sign modules with SHA3-256`
+ ``CONFIG_MODULE_SIG_SHA3_384`` :menuselection:`Sign modules with SHA3-384`
+ ``CONFIG_MODULE_SIG_SHA3_512`` :menuselection:`Sign modules with SHA3-512`
+ =============================== ==========================================
+
+ 此处选择的算法也将被构建到内核中(而不是作为模块),以便使用该算法签名的
+ 模块可以在不导致循环依赖的情况下检查其签名。
+
+
+ (4) :menuselection:`File name or PKCS#11 URI of module signing key`
+ (``CONFIG_MODULE_SIG_KEY``)
+
+ 将此选项设置为除默认值 ``certs/signing_key.pem`` 之外的其他值将禁用签名
+ 密钥的自动生成,并允许使用您选择的密钥对内核模块进行签名。提供的字符串应
+ 标识包含私钥及其对应的 PEM 格式 X.509 证书的文件,或者在 OpenSSL
+ ENGINE_pkcs11 功能正常的系统上,使用 RFC7512 定义的 PKCS#11 URI。在后一
+ 种情况下,PKCS#11 URI 应引用证书和私钥。
+
+ 如果包含私钥的 PEM 文件已加密,或者 PKCS#11 令牌需要 PIN,可以通过
+ ``KBUILD_SIGN_PIN`` 变量在构建时提供。
+
+
+ (5) :menuselection:`Additional X.509 keys for default system keyring`
+ (``CONFIG_SYSTEM_TRUSTED_KEYS``)
+
+ 此选项可设置为包含附加证书的 PEM 编码文件的文件名,这些证书将默认包含在
+ 系统密钥环中。
+
+请注意,启用模块签名会为内核构建过程添加对执行签名工具的OpenSSL开发包的依赖。
+
+
+生成签名密钥
+============
+
+生成和检查签名需要加密密钥对。私钥用于生成签名,相应的公钥用于检查签名。私钥
+仅在构建期间需要,之后可以删除或安全存储。公钥被构建到内核中,以便在加载模块
+时可以使用它来检查签名。
+
+在正常情况下,当 ``CONFIG_MODULE_SIG_KEY`` 保持默认值时,如果文件中不存在密
+钥对,内核构建将使用 openssl 自动生成新的密钥对::
+
+ certs/signing_key.pem
+
+在构建 vmlinux 期间(公钥需要构建到 vmlinux 中)使用参数::
+
+ certs/x509.genkey
+
+文件(如果尚不存在也会生成)。
+
+可以在 RSA(``MODULE_SIG_KEY_TYPE_RSA``)、
+ECDSA(``MODULE_SIG_KEY_TYPE_ECDSA``)和
+ML-DSA(``MODULE_SIG_KEY_TYPE_MLDSA_*``)之间选择生成 RSA 4k、NIST P-384
+密钥对或 ML-DSA 44、65 或 87 密钥对。
+
+强烈建议您提供自己的 x509.genkey 文件。
+
+最值得注意的是,在 x509.genkey 文件中,req_distinguished_name 部分应从默认值
+更改::
+
+ [ req_distinguished_name ]
+ #O = Unspecified company
+ CN = Build time autogenerated kernel key
+ #emailAddress = unspecified.user@unspecified.company
+
+生成的 RSA 密钥大小也可以通过以下方式设置::
+
+ [ req ]
+ default_bits = 4096
+
+也可以使用位于 Linux 内核源代码树根节点中的 x509.genkey 密钥生成配置文件和
+openssl 命令手动生成公钥/私钥文件。以下是生成公钥/私钥文件的示例::
+
+ openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
+ -config x509.genkey -outform PEM -out kernel_key.pem \
+ -keyout kernel_key.pem
+
+然后可以将生成的 kernel_key.pem 文件的完整路径名指定在
+``CONFIG_MODULE_SIG_KEY``选项中,并且将使用其中的证书和密钥而不是自动生成的
+密钥对。
+
+
+内核中的公钥
+============
+
+内核包含一个可由 root 查看的公钥环。它们在名为 ".builtin_trusted_keys" 的密
+钥环中,可以通过以下方式查看::
+
+ [root@deneb ~]# cat /proc/keys
+ ...
+ 223c7853 I------ 1 perm 1f030000 0 0 keyring .builtin_trusted_keys: 1
+ 302d2d52 I------ 1 perm 1f010000 0 0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 []
+
+除了专门为模块签名生成的公钥外,还可以在 ``CONFIG_SYSTEM_TRUSTED_KEYS`` 配置
+选项引用的 PEM 编码文件中提供其他受信任的证书。
+
+此外,架构代码可以从硬件存储中获取公钥并将其添加(例如从 UEFI 密钥数据库)。
+
+最后,可以通过以下方式添加其他公钥::
+
+ keyctl padd asymmetric "" [.builtin_trusted_keys-ID] <[key-file]
+
+例如::
+
+ keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
+
+但是,请注意,内核只允许将由已驻留在 ``.builtin_trusted_keys`` 中的密钥有效
+签名的密钥添加到 ``.builtin_trusted_keys``。
+
+模块手动签名
+============
+
+要手动对模块进行签名,请使用 Linux 内核源代码树中可用的 scripts/sign-file 工
+具。该脚本需要 4 个参数:
+
+ 1. 哈希算法(例如,sha256)
+ 2. 私钥文件名或 PKCS#11 URI
+ 3. 公钥文件名
+ 4. 要签名的内核模块
+
+以下是签名内核模块的示例::
+
+ scripts/sign-file sha512 kernel-signkey.priv \
+ kernel-signkey.x509 module.ko
+
+使用的哈希算法不必与配置的算法匹配,但如果不同,应确保哈希算法要么内置在内核
+中,要么可以在不需要自身的情况下加载。
+
+如果私钥需要密码或 PIN,可以在 $KBUILD_SIGN_PIN 环境变量中提供。
+
+
+已签名模块和剥离
+================
+
+已签名模块在末尾简单地附加了数字签名。模块文件末尾的字符串
+``~Module signature appended~.`` 确认签名存在,但不能确认签名有效!
+
+已签名模块是脆弱的,因为签名在定义的ELF容器之外。因此,一旦计算并附加签名,就
+不得剥离它们。请注意,整个模块都是签名的有效载荷,包括签名时存在的任何和所有
+调试信息。
+
+
+加载已签名模块
+==============
+
+模块通过 insmod、modprobe、``init_module()`` 或 ``finit_module()`` 加载,
+与未签名模块完全一样,因为在用户空间中不进行任何处理。
+所有签名检查都在内核内完成。
+
+
+无效签名和未签名模块
+====================
+
+如果启用了 ``CONFIG_MODULE_SIG_FORCE`` 或在内核启动命令提供了
+module.sig_enforce=1,内核将仅加载具有有效签名且具有公钥的模块。否则,它还将
+加载未签名的模块。任何具有不匹配签名的模块将不被允许加载。
+
+任何具有不可解析签名的模块将被拒绝。
+
+
+管理/保护私钥
+==============
+
+由于私钥用于签名模块,病毒和恶意软件可以使用私钥签名模块并危害操作系统。私钥
+必须被销毁或移动到安全位置,而不是保存在内核源代码树的根节点中。
+
+如果使用相同的私钥为多个内核配置签名模块,必须确保模块版本信息足以防止将模块
+加载到不同的内核中。要么设置 ``CONFIG_MODVERSIONS=y``,要么通过更改
+``EXTRAVERSION`` 或 ``CONFIG_LOCALVERSION`` 确保每个配置具有不同的内核发布字
+符串。
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v4 0/3] mm/memory-failure: add panic option for unrecoverable pages
From: Jiaqi Yan @ 2026-04-18 0:18 UTC (permalink / raw)
To: Breno Leitao
Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel, linux-doc, kernel-team
In-Reply-To: <aeHy3-vQTQYJlGw5@gmail.com>
On Fri, Apr 17, 2026 at 2:11 AM Breno Leitao <leitao@debian.org> wrote:
>
> On Thu, Apr 16, 2026 at 09:26:08AM -0700, Jiaqi Yan wrote:
>
> > So we will always get the same stack trace below, right?
> >
> > panic+0xb4/0xc0
> > action_result+0x278/0x340
> > memory_failure+0x152b/0x1c80
> >
> > IIUC, this stack trace itself doesn't provide any useful information
> > about the memory error, right? What exactly can we use from the stack
> > trace? It is just a side-effect that we failed immediately.
>
> We can use it to correlate problems across a fleet of machines. Let me
> share how crash dump analysis works in large datacenters.
>
> There are thousands of crashes a day (to stay on the low ballpark), and
> different services try to correlate and categorize them into a few
> buckets, something like:
>
> 1. New crash — needs investigation
> 2. Known issue — fix is being rolled out
> 3. Hardware problem — do not spend engineering time on it
>
> When a machine crashes at a random code path like d_lookup() 67 seconds
> after the memory error, the automated triage classifies it as a kernel
> bug in VFS/dcache and assigns it to the filesystem team for
> investigation. Engineers spend time chasing a bug that doesn't exist in
> software — it's a hardware problem.
>
> With the immediate panic at memory_failure(), the stack trace is always
> recognizable and can be automatically classified as category 3 (hardware
> problem). The static stack trace is the feature, not a limitation: it
> gives triage automation a stable signature to match on.
>
> The value isn't in what the stack trace and the panic() tells a human reading
> one crash — it's in what it tells automated systems processing thousands of
> them.
Yeah, in this setting, a crash dump with a fixed signature totally makes sense.
>
> > You can still correlate failure with "Memory failure: 0x1: unhandlable
> > page" and keep running until the actual fatal poison consumption takes
> > down the system. Drawback is that these will be cascading events that
> > can be "noisy". What I see is the choice between failing fast versus
> > failing safe.
>
> Correlating the "unhandlable page" log with a later crash is
> theoretically possible but breaks down in practice at scale:
>
> - The crash may happen seconds, minutes, or hours later — or never, if
> the page isn't accessed again before a reboot.
>
> - The crash happens on a different CPU, different task, different context
>
> — there's no breadcrumb linking it back to the memory error.
>
> - Automated triage systems work on stack traces and panic strings, not
> by correlating dmesg lines across time with later crashes.
>
> - The later crash looks completely different depending on the
> architecture. On arm64, you get a "synchronous external abort". On
> x86, it's a machine check exception. On some platforms, it might be a
> generic page fault or a BUG_ON in a subsystem that found inconsistent
> data. There is no single signature to match — every architecture and
> every consumption path produces a different crash, making automated
> correlation essentially impossible.
>
> - Worse, the crash may never happen at all. If the corrupted memory is
> read but the corruption doesn't trigger a fault — say, a flipped bit
> in a permission field, a size, a pointer that still maps to valid
> memory, or a data buffer — the result is silent data corruption with
> no crash to correlate against. The system continues operating on wrong
> data with no indication anything went wrong.
>
> Also, I wouldn't call continuing with known-corrupted kernel memory
> "failing safe" — it's the opposite. The kernel has no mechanism to
> fence off a poisoned slab page or page table from future access.
> Continuing is failing unsafely with a delayed, unpredictable
> consequence.
>
>
> > > Isn't the clean approach way better than the random one?
> >
> > I don't fully agree. In the past upstream has enhanced many kernel mm
> > services (e.g. khugepaged, page migration, dump_user_range()) to
> > recover from memory error in order to improve system availability,
> > given these service or tools can fail safe. Seeing many crashes
> > pointing to a certain in-kernel service at consumption time helped us
> > decide what services we should enhance, and which service we should
> > prioritize. Of course not all kernel code can be recovered from memory
> > error, but that doesn't mean knowing what kernel code often caused
> > crash isn't useful.
>
>
> That's a fair point — consumption-time crashes have historically been
> useful for identifying which kernel services to harden. But I'd argue
> this patch doesn't prevent that analysis, it complements it.
>
> The sysctl defaults to off. Operators who want to observe where poison
> is consumed — to prioritize which services to enhance — can leave it
> disabled and get exactly the behavior they have today.
>
> But for operators running large fleets where the priority is fast
> diagnosis and machine replacement rather than kernel hardening research,
> the immediate panic is what they need. They already know the memory is
> bad, they don't need the kernel to keep running to find out which
> subsystem hits it first.
>
> Also, the services you mention — khugepaged, page migration,
> dump_user_range() — were enhanced to handle errors in user pages,
> where recovery is possible (kill the process, fail the migration). The
> pages this patch panics on — reserved pages, unknown page types — are
> kernel memory where _no_ recovery mechanism exists or is likely to exist.
Maybe, but I won't be surprised if one day someone comes up with some idea.
> There's no service to enhance for those; the only options are crash now
> or crash later, given a crucial memory page got lost.
>
> > Anyway, I only have a second opinion on the usefulness of a static
> > stack trace. This fail-fast option is good to have. Thanks!
>
> Thanks for the review! Just to make sure I understand your position correctly —
> are you saying you'd like changes to the patch, or is this more of a general
> observation about the tradeoff?
No change needed. I just hope to get more clarification from you on
the usefulness of the stack track, and I do get it. Thanks!
>
> --breno
^ permalink raw reply
* [PATCH] docs: Add overview and SLUB allocator sections to slab documentation
From: Nick Huang @ 2026-04-18 0:06 UTC (permalink / raw)
To: Vlastimil Babka, Harry Yoo, Andrew Morton, David Hildenbrand,
Jonathan Corbet
Cc: Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
Lorenzo Stoakes, Liam R . Howlett, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-mm, linux-doc,
linux-kernel, Nick Huang
- Add "Overview" section explaining the slab allocator's role and purpose
- Document the three main slab allocator implementations (SLAB, SLUB, SLOB)
- Highlight SLUB as the default allocator on modern systems
- Add "SLUB Allocator" subsection with detailed information:
- Explain SLUB's design goals and advantages over legacy SLAB
- Document its focus on simplification and performance
- Note support for both uniprocessor and SMP systems
Signed-off-by: Nick Huang <sef1548@gmail.com>
---
Documentation/mm/slab.rst | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/Documentation/mm/slab.rst b/Documentation/mm/slab.rst
index 2bcc58ada302..2d1d093afb7b 100644
--- a/Documentation/mm/slab.rst
+++ b/Documentation/mm/slab.rst
@@ -4,6 +4,32 @@
Slab Allocation
===============
+Overview
+========
+
+The slab allocator is responsible for efficient allocation and reuse of
+small kernel objects. It reduces internal fragmentation and improves
+performance by caching frequently used objects.
+
+The Linux kernel provides multiple slab allocator implementations,
+including SLAB, SLUB, and SLOB. Among these, SLUB is the default
+allocator on most modern systems.
+
+SLUB Allocator
+==============
+
+Overview
+--------
+
+SLUB is a slab allocator designed to replace the legacy SLAB allocator
+(mm/slab.c). It addresses the complexity, scalability limitations, and
+memory overhead of the SLAB implementation.
+
+The primary goal of SLUB is to simplify slab allocation while improving
+performance on both uniprocessor (UP) and symmetric multiprocessing (SMP)
+systems.
+
+
Functions and structures
========================
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v5 2/3] ima: trim N IMA event log records
From: steven chen @ 2026-04-17 21:26 UTC (permalink / raw)
To: Roberto Sassu, linux-integrity
Cc: zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, corbet,
serge, paul, jmorris, linux-security-module, anirudhve,
gregorylumen, nramas, sushring, linux-doc, steven chen
In-Reply-To: <b0b65c5a2d407301905dc4232eee4b16030920c8.camel@huaweicloud.com>
On 4/7/2026 9:19 AM, Roberto Sassu wrote:
> On Wed, 2026-04-01 at 10:29 -0700, steven chen wrote:
>> Trim N entries of the IMA event logs. Do not clean the hash table.
> The very first change of this patch is the kernel option
> ima_flush_htable option that I introduced for my use case.
>
> At the bottom of this patch you actually check the ima_flush_htable
> boolean, and delete the measurements entries without disconnecting them
> from the hash table, so the digest lookup is done on freed memory.
>
> Next, you duplicated my changes regarding the measurements list
> counter. But instead of removing the old counter from the hash table,
> you keep incrementing both, but use the new one.
>
> In ima_log_trim_open(), you use again my duplicated code to manage
> exclusive write/concurrent read scheme for the measurement interfaces.
> However, for read, if the process does not have CAP_SYS_ADMIN it falls
> back calling _ima_measurements_open(). Not sure it was intended.
Hi Roberto,
I acknowledged these are coming from you in my cover letter. Please
let me know the best way to show your contribution and I will update
in my next version.
All above issues you mentioned, I will update in next version.
> And, in ima_log_trim_release(), you check again CAP_SYS_ADMIN which is
> redundant, you would not reach this code if the same requirements were
> not met at open time. You also return an error on close().
Will update in next version.
Thanks,
> In ima_log_trim_write(), you do manual string to number conversion for
> your first number and use kstrtoul() for the second.
>
> The measurements lists and the associated counter are atomically
> updated in ima_add_digest_entry(), but not atomically accessed in
> ima_delete_event_log(). Also, the measurements list is traversed
> without _rcu variant or lock.
Will update in next version.
Thanks
>
> While this trimming scheme aims at minimizing the kernel space and user
> space delay, it also introduces the following problem. If two agents
> perform a TPM quote that include a different number of entries, there
> is no guarantee that the one willing to trim less entries wins. Which
> means that, one agent could end up not seeing the most recent entries,
> as they were already trimmed by the other agent.
This should be acceptable: the second trim request will be rejected and
the agent can find all logs in user space if all user agents handle the log
in the right way.
Also there is other way to do it: the user agent can hold the list by open
the ima_trim_log with write permission during reading, attestation, trim
period.
In this way, the user agent for "Trim N method" will have similar user
space hold time
as "staged method" but has less kernel list lock time, and user agent
requirement
for "Trim N method" is much simple than that for "stage method".
>
> My solution is not affected by this problem, since there will be only
> one process collecting all the measurements in user space and exposing
> them to the agents.
Please see above response.
Thanks,
Steven
>
> Also, I didn't understand why T and ima_measure_users have to be
> preserved on soft reboots. Especially ima_measure_users reflects the
> state of open files for a particular kernel, but on soft reboot a new
> kernel is booted.
>
> I personally will not endorse a solution based on the ima_trim_log
> interface. I could accept trimming N even more efficiently than we
> currently do with a lockless walk to determine the cutting position in
> ima_queue_stage(), so that we don't need to splice back entries to the
> measurement list. This would be a replacement of patch 11 in my patch
> set, but this would be as far as I would like to go.
>
> Roberto
>
>> The values saved in hash table were already used.
>>
>> Provide a userspace interface ima_trim_log:
>> When read this interface, it returns total number T of entries trimmed
>> since system boot up.
>> When write to this interface need to provide two numbers T:N to let
>> kernel to trim N entries of IMA event logs.
>>
>> Kernel measurement list lock time performance improvement by not
>> clean the hash table.
>>
>> when kernel get log trim request T:N
>> - Get the T, compare with the total trimmed number
>> - if equal, then do trim N and change T to T+N
>> - else return error
>>
>> Signed-off-by: steven chen <chenste@linux.microsoft.com>
>> ---
>> .../admin-guide/kernel-parameters.txt | 4 +
>> security/integrity/ima/ima.h | 4 +-
>> security/integrity/ima/ima_fs.c | 198 +++++++++++++++++-
>> security/integrity/ima/ima_kexec.c | 2 +-
>> security/integrity/ima/ima_queue.c | 96 +++++++++
>> 5 files changed, 296 insertions(+), 8 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index e92c0056e4e0..cd1a1d0bf0e2 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2197,6 +2197,10 @@
>> Use the canonical format for the binary runtime
>> measurements, instead of host native format.
>>
>> + ima_flush_htable [IMA]
>> + Flush the measurement list hash table when trim all
>> + or a part of it for deletion.
>> +
>> ima_hash= [IMA]
>> Format: { md5 | sha1 | rmd160 | sha256 | sha384
>> | sha512 | ... }
>> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
>> index e3d71d8d56e3..5cbee3a295a0 100644
>> --- a/security/integrity/ima/ima.h
>> +++ b/security/integrity/ima/ima.h
>> @@ -243,11 +243,13 @@ void ima_post_key_create_or_update(struct key *keyring, struct key *key,
>> const void *payload, size_t plen,
>> unsigned long flags, bool create);
>> #endif
>> -
>> +extern atomic_long_t ima_number_entries;
>> #ifdef CONFIG_IMA_KEXEC
>> void ima_measure_kexec_event(const char *event_name);
>> +long ima_delete_event_log(long req_val);
>> #else
>> static inline void ima_measure_kexec_event(const char *event_name) {}
>> +static inline long ima_delete_event_log(long req_val) { return 0; }
>> #endif
>>
>> /*
>> diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
>> index 87045b09f120..8e26e0f34311 100644
>> --- a/security/integrity/ima/ima_fs.c
>> +++ b/security/integrity/ima/ima_fs.c
>> @@ -21,6 +21,9 @@
>> #include <linux/rcupdate.h>
>> #include <linux/parser.h>
>> #include <linux/vmalloc.h>
>> +#include <linux/ktime.h>
>> +#include <linux/timekeeping.h>
>> +#include <linux/ima.h>
>>
>> #include "ima.h"
>>
>> @@ -38,6 +41,17 @@ __setup("ima_canonical_fmt", default_canonical_fmt_setup);
>>
>> static int valid_policy = 1;
>>
>> +#define IMA_LOG_TRIM_REQ_NUM_LENGTH 15
>> +#define IMA_LOG_TRIM_REQ_TOTAL_LENGTH 32
>> +atomic_long_t ima_number_entries = ATOMIC_LONG_INIT(0);
>> +static long trimcount;
>> +/* mutex protects atomicity of trimming measurement list
>> + * and also protects atomicity the measurement list read
>> + * write operation.
>> + */
>> +static DEFINE_MUTEX(ima_measure_lock);
>> +static long ima_measure_users;
>> +
>> static ssize_t ima_show_htable_value(char __user *buf, size_t count,
>> loff_t *ppos, atomic_long_t *val)
>> {
>> @@ -64,8 +78,7 @@ static ssize_t ima_show_measurements_count(struct file *filp,
>> char __user *buf,
>> size_t count, loff_t *ppos)
>> {
>> - return ima_show_htable_value(buf, count, ppos, &ima_htable.len);
>> -
>> + return ima_show_htable_value(buf, count, ppos, &ima_number_entries);
>> }
>>
>> static const struct file_operations ima_measurements_count_ops = {
>> @@ -202,16 +215,77 @@ static const struct seq_operations ima_measurments_seqops = {
>> .show = ima_measurements_show
>> };
>>
>> +/*
>> + * _ima_measurements_open - open the IMA measurements file
>> + * @inode: inode of the file being opened
>> + * @file: file being opened
>> + * @seq_ops: sequence operations for the file
>> + *
>> + * Returns 0 on success, or negative error code.
>> + * Implements mutual exclusion between readers and writer
>> + * of the measurements file. Multiple readers are allowed,
>> + * but writer get exclusive access only no other readers/writers.
>> + * Readers is not allowed when there is a writer.
>> + */
>> +static int _ima_measurements_open(struct inode *inode, struct file *file,
>> + const struct seq_operations *seq_ops)
>> +{
>> + bool write = !!(file->f_mode & FMODE_WRITE);
>> + int ret;
>> +
>> + if (write && !capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + mutex_lock(&ima_measure_lock);
>> + if ((write && ima_measure_users != 0) ||
>> + (!write && ima_measure_users < 0)) {
>> + mutex_unlock(&ima_measure_lock);
>> + return -EBUSY;
>> + }
>> +
>> + ret = seq_open(file, seq_ops);
>> + if (ret < 0) {
>> + mutex_unlock(&ima_measure_lock);
>> + return ret;
>> + }
>> +
>> + if (write)
>> + ima_measure_users--;
>> + else
>> + ima_measure_users++;
>> +
>> + mutex_unlock(&ima_measure_lock);
>> + return ret;
>> +}
>> +
>> static int ima_measurements_open(struct inode *inode, struct file *file)
>> {
>> - return seq_open(file, &ima_measurments_seqops);
>> + return _ima_measurements_open(inode, file, &ima_measurments_seqops);
>> +}
>> +
>> +static int ima_measurements_release(struct inode *inode, struct file *file)
>> +{
>> + bool write = !!(file->f_mode & FMODE_WRITE);
>> + int ret;
>> +
>> + mutex_lock(&ima_measure_lock);
>> + ret = seq_release(inode, file);
>> + if (!ret) {
>> + if (!write)
>> + ima_measure_users--;
>> + else
>> + ima_measure_users++;
>> + }
>> +
>> + mutex_unlock(&ima_measure_lock);
>> + return ret;
>> }
>>
>> static const struct file_operations ima_measurements_ops = {
>> .open = ima_measurements_open,
>> .read = seq_read,
>> .llseek = seq_lseek,
>> - .release = seq_release,
>> + .release = ima_measurements_release,
>> };
>>
>> void ima_print_digest(struct seq_file *m, u8 *digest, u32 size)
>> @@ -279,14 +353,114 @@ static const struct seq_operations ima_ascii_measurements_seqops = {
>>
>> static int ima_ascii_measurements_open(struct inode *inode, struct file *file)
>> {
>> - return seq_open(file, &ima_ascii_measurements_seqops);
>> + return _ima_measurements_open(inode, file, &ima_ascii_measurements_seqops);
>> }
>>
>> static const struct file_operations ima_ascii_measurements_ops = {
>> .open = ima_ascii_measurements_open,
>> .read = seq_read,
>> .llseek = seq_lseek,
>> - .release = seq_release,
>> + .release = ima_measurements_release,
>> +};
>> +
>> +static int ima_log_trim_open(struct inode *inode, struct file *file)
>> +{
>> + bool write = !!(file->f_mode & FMODE_WRITE);
>> +
>> + if (!write && capable(CAP_SYS_ADMIN))
>> + return 0;
>> + else if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + return _ima_measurements_open(inode, file, &ima_measurments_seqops);
>> +}
>> +
>> +static ssize_t ima_log_trim_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
>> +{
>> + char tmpbuf[IMA_LOG_TRIM_REQ_NUM_LENGTH];
>> + ssize_t len;
>> +
>> + len = scnprintf(tmpbuf, sizeof(tmpbuf), "%li\n", trimcount);
>> + return simple_read_from_buffer(buf, size, ppos, tmpbuf, len);
>> +}
>> +
>> +static ssize_t ima_log_trim_write(struct file *file,
>> + const char __user *buf, size_t datalen, loff_t *ppos)
>> +{
>> + char tmpbuf[IMA_LOG_TRIM_REQ_TOTAL_LENGTH];
>> + char *p = tmpbuf;
>> + long count, ret, val = 0, max = LONG_MAX;
>> +
>> + if (*ppos > 0 || datalen > IMA_LOG_TRIM_REQ_TOTAL_LENGTH || datalen < 2) {
>> + ret = -EINVAL;
>> + goto out;
>> + }
>> +
>> + if (copy_from_user(tmpbuf, buf, datalen) != 0) {
>> + ret = -EFAULT;
>> + goto out;
>> + }
>> +
>> + p = tmpbuf;
>> +
>> + while (*p && *p != ':') {
>> + if (!isdigit((unsigned char)*p))
>> + return -EINVAL;
>> +
>> + /* digit value */
>> + int d = *p - '0';
>> +
>> + /* overflow check: val * 10 + d > max -> (val > (max - d) / 10) */
>> + if (val > (max - d) / 10)
>> + return -ERANGE;
>> +
>> + val = val * 10 + d;
>> + p++;
>> + }
>> +
>> + if (*p != ':')
>> + return -EINVAL;
>> +
>> + /* verify trim count matches */
>> + if (val != trimcount)
>> + return -EINVAL;
>> +
>> + p++; /* skip ':' */
>> + ret = kstrtoul(p, 0, &count);
>> +
>> + if (ret < 0)
>> + goto out;
>> +
>> + ret = ima_delete_event_log(count);
>> +
>> + if (ret < 0)
>> + goto out;
>> +
>> + trimcount += ret;
>> +
>> + ret = datalen;
>> +out:
>> + return ret;
>> +}
>> +
>> +static int ima_log_trim_release(struct inode *inode, struct file *file)
>> +{
>> + bool write = !!(file->f_mode & FMODE_WRITE);
>> +
>> + if (!write && capable(CAP_SYS_ADMIN))
>> + return 0;
>> + else if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + return ima_measurements_release(inode, file);
>> +}
>> +
>> +static const struct file_operations ima_log_trim_ops = {
>> + .open = ima_log_trim_open,
>> + .read = ima_log_trim_read,
>> + .write = ima_log_trim_write,
>> + .llseek = generic_file_llseek,
>> + .release = ima_log_trim_release
>> };
>>
>> static ssize_t ima_read_policy(char *path)
>> @@ -528,6 +702,18 @@ int __init ima_fs_init(void)
>> goto out;
>> }
>>
>> + if (IS_ENABLED(CONFIG_IMA_LOG_TRIMMING)) {
>> + dentry = securityfs_create_file("ima_trim_log",
>> + S_IRUSR | S_IRGRP | S_IWUSR | S_IWGRP,
>> + ima_dir, NULL, &ima_log_trim_ops);
>> + if (IS_ERR(dentry)) {
>> + ret = PTR_ERR(dentry);
>> + goto out;
>> + }
>> + }
>> +
>> + trimcount = 0;
>> +
>> dentry = securityfs_create_file("runtime_measurements_count",
>> S_IRUSR | S_IRGRP, ima_dir, NULL,
>> &ima_measurements_count_ops);
>> diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c
>> index 7362f68f2d8b..bee997683e03 100644
>> --- a/security/integrity/ima/ima_kexec.c
>> +++ b/security/integrity/ima/ima_kexec.c
>> @@ -41,7 +41,7 @@ void ima_measure_kexec_event(const char *event_name)
>> int n;
>>
>> buf_size = ima_get_binary_runtime_size();
>> - len = atomic_long_read(&ima_htable.len);
>> + len = atomic_long_read(&ima_number_entries);
>>
>> n = scnprintf(ima_kexec_event, IMA_KEXEC_EVENT_LEN,
>> "kexec_segment_size=%lu;ima_binary_runtime_size=%lu;"
>> diff --git a/security/integrity/ima/ima_queue.c b/security/integrity/ima/ima_queue.c
>> index 590637e81ad1..07225e19b9b5 100644
>> --- a/security/integrity/ima/ima_queue.c
>> +++ b/security/integrity/ima/ima_queue.c
>> @@ -22,6 +22,14 @@
>>
>> #define AUDIT_CAUSE_LEN_MAX 32
>>
>> +bool ima_flush_htable;
>> +static int __init ima_flush_htable_setup(char *str)
>> +{
>> + ima_flush_htable = true;
>> + return 1;
>> +}
>> +__setup("ima_flush_htable", ima_flush_htable_setup);
>> +
>> /* pre-allocated array of tpm_digest structures to extend a PCR */
>> static struct tpm_digest *digests;
>>
>> @@ -114,6 +122,7 @@ static int ima_add_digest_entry(struct ima_template_entry *entry,
>> list_add_tail_rcu(&qe->later, &ima_measurements);
>>
>> atomic_long_inc(&ima_htable.len);
>> + atomic_long_inc(&ima_number_entries);
>> if (update_htable) {
>> key = ima_hash_key(entry->digests[ima_hash_algo_idx].digest);
>> hlist_add_head_rcu(&qe->hnext, &ima_htable.queue[key]);
>> @@ -220,6 +229,93 @@ int ima_add_template_entry(struct ima_template_entry *entry, int violation,
>> return result;
>> }
>>
>> +/**
>> + * ima_delete_event_log - delete IMA event entry
>> + * @num_records: number of records to delete
>> + *
>> + * delete num_records entries off the measurement list.
>> + * Returns num_records, or negative error code.
>> + */
>> +long ima_delete_event_log(long num_records)
>> +{
>> + long len, cur = num_records, tmp_len = 0;
>> + struct ima_queue_entry *qe, *qe_tmp;
>> + LIST_HEAD(ima_measurements_to_delete);
>> + struct list_head *list_ptr;
>> +
>> + if (!IS_ENABLED(CONFIG_IMA_LOG_TRIMMING))
>> + return -EOPNOTSUPP;
>> +
>> + if (num_records <= 0)
>> + return num_records;
>> +
>> + list_ptr = &ima_measurements;
>> +
>> + len = atomic_long_read(&ima_number_entries);
>> +
>> + if (num_records <= len) {
>> + list_for_each_entry(qe, list_ptr, later) {
>> + if (cur > 0) {
>> + tmp_len += get_binary_runtime_size(qe->entry);
>> + --cur;
>> + }
>> + if (cur == 0) {
>> + qe_tmp = qe;
>> + break;
>> + }
>> + }
>> + }
>> + else {
>> + return -ENOENT;
>> + }
>> +
>> +
>> + mutex_lock(&ima_extend_list_mutex);
>> + len = atomic_long_read(&ima_number_entries);
>> +
>> + if (num_records == len) {
>> + list_replace(&ima_measurements, &ima_measurements_to_delete);
>> + INIT_LIST_HEAD(&ima_measurements);
>> + atomic_long_set(&ima_number_entries, 0);
>> + list_ptr = &ima_measurements_to_delete;
>> + }
>> + else {
>> + __list_cut_position(&ima_measurements_to_delete, &ima_measurements,
>> + &qe_tmp->later);
>> + atomic_long_sub(num_records, &ima_number_entries);
>> + if (IS_ENABLED(CONFIG_IMA_KEXEC))
>> + binary_runtime_size -= tmp_len;
>> + }
>> +
>> + mutex_unlock(&ima_extend_list_mutex);
>> +
>> + if (ima_flush_htable)
>> + synchronize_rcu();
>> +
>> + list_for_each_entry_safe(qe, qe_tmp, &ima_measurements_to_delete, later) {
>> + /*
>> + * Ok because after list delete qe is only accessed by
>> + * ima_lookup_digest_entry().
>> + */
>> + for (int i = 0; i < qe->entry->template_desc->num_fields; i++) {
>> + kfree(qe->entry->template_data[i].data);
>> + qe->entry->template_data[i].data = NULL;
>> + qe->entry->template_data[i].len = 0;
>> + }
>> +
>> + list_del(&qe->later);
>> +
>> + /* No leak if !ima_flush_htable, referenced by ima_htable. */
>> + if (ima_flush_htable) {
>> + kfree(qe->entry->digests);
>> + kfree(qe->entry);
>> + kfree(qe);
>> + }
>> + }
>> +
>> + return num_records;
>> +}
>> +
>> int ima_restore_measurement_entry(struct ima_template_entry *entry)
>> {
>> int result = 0;
^ permalink raw reply
* Re: [PATCH v4 2/8] dt-bindings: arm: Add zx297520v3 board binding
From: Rob Herring (Arm) @ 2026-04-17 21:08 UTC (permalink / raw)
To: Stefan Dösinger
Cc: linux-kernel, Conor Dooley, Jonathan Corbet, Alexandre Belloni,
Greg Kroah-Hartman, linux-doc, devicetree, Drew Fustini,
Linus Walleij, Jiri Slaby, Russell King, soc, Arnd Bergmann,
Krzysztof Kozlowski, Krzysztof Kozlowski, linux-arm-kernel,
linux-serial, Shuah Khan
In-Reply-To: <20260416-send-v4-2-e19d02b944ec@gmail.com>
On Thu, 16 Apr 2026 23:19:10 +0300, Stefan Dösinger wrote:
> Add a compatible for boards based on the ZTE zx297520v3 SoC.
>
> Signed-off-by: Stefan Dösinger <stefandoesinger@gmail.com>
>
> ---
>
> The list of devices is the devices I have access to for testing. There
> are many more devices based on this board and it is not always easy to
> identify them. Often they are sold without any branding ("4G home
> router") or with mobile carrier branding.
> ---
> Documentation/devicetree/bindings/arm/zte.yaml | 25 +++++++++++++++++++++++++
> MAINTAINERS | 1 +
> 2 files changed, 26 insertions(+)
>
My bot found errors running 'make dt_binding_check' on your patch:
yamllint warnings/errors:
./Documentation/devicetree/bindings/arm/zte.yaml:19:13: [warning] wrong indentation: expected 14 but found 12 (indentation)
dtschema/dtc warnings/errors:
doc reference errors (make refcheckdocs):
See https://patchwork.kernel.org/project/devicetree/patch/20260416-send-v4-2-e19d02b944ec@gmail.com
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
^ permalink raw reply
* Re: [PATCH V10 00/10] famfs: port into fuse
From: Joanne Koong @ 2026-04-17 19:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Miklos Szeredi, John Groves, Bernd Schubert, John Groves,
Dan Williams, Bernd Schubert, Alison Schofield, John Groves,
Jonathan Corbet, Shuah Khan, Vishal Verma, Dave Jiang,
Matthew Wilcox, Jan Kara, Alexander Viro, David Hildenbrand,
Christian Brauner, Darrick J . Wong, Randy Dunlap, Jeff Layton,
Amir Goldstein, Jonathan Cameron, Stefan Hajnoczi, Josef Bacik,
Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <aeHpjpNN4TliZOyp@infradead.org>
On Fri, Apr 17, 2026 at 1:04 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> This is the first mail without annoying and pointless full quotes,
> so chiming in here. Sorry if I missed something important in all the
> noise.
>
> On Tue, Apr 14, 2026 at 03:19:36PM +0200, Miklos Szeredi wrote:
> > On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > > Overall, my intention with bringing this up is just to make sure we're
> > > at least aware of this alternative before anything is merged and
> > > permanent. If Miklos and you think we should land this series, then
> > > I'm on board with that.
> >
> > TBH, I'd prefer not to add the famfs specific mapping interface if not
> > absolutely necessary.
>
> Yes, fuse needing support for a specific file systems sounds like a
> design mistake.
>
> >This was the main sticking point originally,
> > but there seemed to be no better alternative.
> >
> > However with the bpf approach this would be gone, which is great.
>
> So what is this bpf magic actually trying to solve?
It is trying to avoid having famfs-specific implementation details
hardcoded permanently into fuse's uapi and kernel code. I really like
your suggestion of adding generic stride/offset multi-device support
to fs/iomap. That is a much better solution than bpf.
Thanks,
Joanne
>
^ permalink raw reply
* [PATCH] Documentation: hwmon: fix link to ideapad-laptop.c file
From: Ninad Naik @ 2026-04-17 19:14 UTC (permalink / raw)
To: sergiomelas, linux, corbet, skhan
Cc: linux-hwmon, linux-doc, linux-kernel, me, linux-kernel-mentees,
Ninad Naik
The ideapad-laptop.c file now exists inside drivers/platform/x86/lenovo/
directory. Updating the GitHub link to the correct path.
Signed-off-by: Ninad Naik <ninadnaik07@gmail.com>
---
Documentation/hwmon/yogafan.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/hwmon/yogafan.rst b/Documentation/hwmon/yogafan.rst
index c553a381f772..68761947a1a8 100644
--- a/Documentation/hwmon/yogafan.rst
+++ b/Documentation/hwmon/yogafan.rst
@@ -135,4 +135,4 @@ References
4. **Lenovo IdeaPad Laptop Driver:** Reference for DMI-based hardware
feature gating in Lenovo laptops.
- https://github.com/torvalds/linux/blob/master/drivers/platform/x86/ideapad-laptop.c
+ https://github.com/torvalds/linux/blob/master/drivers/platform/x86/lenovo/ideapad-laptop.c
--
2.53.0
^ permalink raw reply related
* [PATCH] docs: kselftest: Document the FORCE_TARGETS build variable
From: Ricardo B. Marlière @ 2026-04-17 17:36 UTC (permalink / raw)
To: Shuah Khan, Jonathan Corbet
Cc: Shuah Khan, linux-kselftest, workflows, linux-doc, linux-kernel,
Ricardo B. Marlière
FORCE_TARGETS has been part of the kselftest build system for
some time but is absent from the developer documentation. Without
an entry here, users relying on kselftest in CI pipelines would
have to read the selftests Makefile directly to discover the
option.
A build that exits zero despite some targets failing can mask
real breakage and mislead automated systems into reporting
success. Add a dedicated section so that CI authors can easily
find and adopt FORCE_TARGETS=1 to turn such silent partial
failures into hard errors.
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
---
Documentation/dev-tools/kselftest.rst | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index 18c2da67fae4..d7bfe320338c 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -126,6 +126,18 @@ dedicated skiplist::
See the top-level tools/testing/selftests/Makefile for the list of all
possible targets.
+Requiring all targets to build successfully
+===========================================
+
+By default, the build succeeds as long as at least one target builds
+without error. Set ``FORCE_TARGETS=1`` to instead require every target to
+build successfully; make will abort as soon as any target fails::
+
+ $ make -C tools/testing/selftests FORCE_TARGETS=1
+
+This applies to both the ``all`` and ``install`` targets and is useful in
+CI environments where a silent partial build would be misleading.
+
Running the full range hotplug selftests
========================================
---
base-commit: 83ef26f911432d9c98b6d8b6ed0709a8b79cd834
change-id: 20260417-selftests-docs-fdf4e922ad20
Best regards,
--
Ricardo B. Marlière <rbm@suse.com>
^ permalink raw reply related
* Re: [PATCH v4 7/8] ARM: dts: Declare UART1 on zx297520v3 boards
From: Stefan Dösinger @ 2026-04-17 17:24 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Jonathan Corbet, Shuah Khan, Russell King, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Krzysztof Kozlowski,
Alexandre Belloni, Linus Walleij, Drew Fustini,
Greg Kroah-Hartman, Jiri Slaby, linux-doc, linux-kernel,
linux-arm-kernel, devicetree, soc, linux-serial
In-Reply-To: <0d80dcbe-cb46-45e5-821a-de5299d6a663@app.fastmail.com>
[-- Attachment #1: Type: text/plain, Size: 2283 bytes --]
Hi Arnd,
Thanks for your comments.
> Am 17.04.2026 um 11:59 schrieb Arnd Bergmann <arnd@arndb.de>:
>
> On Thu, Apr 16, 2026, at 22:19, Stefan Dösinger wrote:
>>
>> The reason why I add the serial1=uart1 alias is to keep console=ttyAMA1
>> stable regardless of the other enabled UARTs. UART0, as the name
>> implies, has a lower MMIO address, but uart1 is the one that usually has
>> the boot output and console.
>
> I'm not sure I'm following here. You generally want to either make
> sure the alias matches whatever number is printed on the product
> if there are multiple numbered ports, or you just use 'serial0'
> as the only alias if there is only one port.
Not all boards have their uart pins labeled, but those that do have the pins that connect to the UART at 0x01408000 named UART1RX/UART1TX. Most boards have only one though. I have seen a picture of only one that has UART0 and UART1. I could not test that board myself yet.
My original reason is one of developer convenience: If I have
uart0=serial@131000{
reg = <0x00131000 0x1000>;
...
status = "disabled";
};
uart1=serial@1408000{
reg = <0x01408000 0x1000>;
...
status = "okay";
};
cmdline="... console=ttyAMA{0/1} ..."
changing uart0.status between disabled and okay (e.g. to experiment with uart0 and pinctrl) required changing the command line to match. I found that pretty annoying and the aliases seemed like the best way to avoid this.
Either way I am open to do whatever. I can keep the current naming for the reasons stated above, I can name serial@1408000 "uart0" and leave the others without an alias or I can drop the alias altogether.
> Either way, the alias should go into the board specific file, not
> the general SoC file, as a board might be using a different
> set of UARTs.
That works for me, I'll move them. The aliases will most likely be the same for all boards based on this chipset, meaning duplicate code, but matching the alias to the board labels makes sense to me.
> Since you know the addresses of the other uart instances, I would
> suggest you add all of them at the same time.
Will do.
I'll hold off for a bit before I resend the patches to see if some other comments come up.
Cheers,
Stefan
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v3 1/2] dt-bindings: hwmon: pmbus: add max20830
From: Conor Dooley @ 2026-04-17 16:24 UTC (permalink / raw)
To: Alexis Czezar Torreno
Cc: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Shuah Khan, linux-hwmon, devicetree,
linux-kernel, linux-doc
In-Reply-To: <20260417-dev_max20830-v3-1-0cb8d56067aa@analog.com>
[-- Attachment #1: Type: text/plain, Size: 490 bytes --]
On Fri, Apr 17, 2026 at 04:27:13PM +0800, Alexis Czezar Torreno wrote:
> Add device tree documentation for MAX20830 step-down DC-DC switching
> regulator with PMBus interface.
>
> Signed-off-by: Alexis Czezar Torreno <alexisczezar.torreno@analog.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
pw-bot: not-applicable
In the future, please relax a bit with new revisions, particularly
during the merge window when nothing is gonna get applied and there's no
urgency.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH V10 00/10] famfs: port into fuse
From: Darrick J. Wong @ 2026-04-17 15:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Joanne Koong, Dan Williams, Gregory Price, John Groves,
Miklos Szeredi, Bernd Schubert, John Groves, Dan J Williams,
Bernd Schubert, Alison Schofield, John Groves, Jonathan Corbet,
Shuah Khan, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya, Chen Linxuan,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
In-Reply-To: <aeHsh3swBp2IZ4cX@infradead.org>
On Fri, Apr 17, 2026 at 01:17:11AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 16, 2026 at 10:40:31PM -0700, Darrick J. Wong wrote:
> > > > ...the memory interleaving is a rather interesting quality of famfs.
> > > > There's no good way to express a formulaic meta-mapping in traditional
> > > > iomap parlance, and famfs needs that to interleave across memory
> > > > controllers/dimm boxen/whatever. Throwing individual iomaps at the
> > > > kernel is a very inefficient way to do that. So I don't think there's a
> > > > good reason to get rid of GET_FMAP at this time...
> > >
> > > So could we make the interleaving part generic then? Striped /
> > > interleaved layouts are used elsewhere (eg RAID-0, md-stripe, etc.) -
> > > could we add a generic interleave descriptor to the uapi and use that
> > > for what famfs needs?
> >
> > I doubt it. md-raid presents a unified LBA address space, which means
> > that the filesystem doesn't have to know anything about whatever
> > translations might happen underneath it.
>
> Unless that translation happens in the file system. It does for btrfs
> right now, and it does for pNFS blocklayout. The former is using iomap
> for direct I/O (and has old code and vague plans for using it for
> buffered I/O maybe eventually), the latter does not currently but would
> benefit a lot, although wiring it through the NFS code will be painful.
Not to mention a huge layering violation unless you're doing xraid. ;)
That said, the fuse-iomap patches have been waiting for a review since
October, and I'd really prefer to get the base enablement of iomap
merged before we start asking about things that existing fuse servers
and iomap client filesystems don't do, like in-filesystem raid.
> > Most filesystems that implement striping themselves don't restrict
> > themselves to monotonically increasing LBA ranges rotored across each
> > device like md-raid0 does.
>
> Mappings can be more flexible, but they usually would not in a single
> iomap iteration.
>
> > But for whatever reason, pmem/dax don't have remapping layers like
> > md/dm so filesystems have to do that on their own if the hardware
> > doesn't do it for them.
>
> DM actually supports DAX. I don't think that's a very good way as it
> adds a lot of overhead for little gain for striping.
Aha, it has long been my suspicion that looping through mapping layers
is a real performance pit for memory-based file stores. Thanks for
saying that explicitly.
--D
^ permalink raw reply
* Re: [PATCH 0/3] rust: add Kconfig.test
From: Gary Guo @ 2026-04-17 14:11 UTC (permalink / raw)
To: Yury Norov, Miguel Ojeda, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Jonathan Corbet, Shuah Khan,
Lorenzo Stoakes, Vlastimil Babka, Liam R. Howlett,
Uladzislau Rezki, Burak Emir, Yury Norov, Brendan Higgins,
David Gow, Rae Moar, Will Deacon, Peter Zijlstra, Mark Rutland,
Nathan Chancellor, Kees Cook, Nicolas Schier,
Thomas Weißschuh, Thomas Gleixner, Douglas Anderson,
Shakeel Butt, Christian Brauner, Randy Dunlap, Tamir Duberstein,
rust-for-linux, linux-doc, linux-kernel, linux-kselftest,
kunit-dev
In-Reply-To: <20260417031531.315281-1-ynorov@nvidia.com>
On Fri Apr 17, 2026 at 4:15 AM BST, Yury Norov wrote:
> There are 6 individual Rust KUnit tests. All the tests are compiled
> unconditionally now, which adds ~200 kB to the kernel image on my
> x86_64 buld. As Rust matures, this bloating will inevitably grow.
>
> Add Kconfig.test, which provides a RUST_KUNIT_TESTS menu, and all
> individual tests under it.
>
> Yury Norov (3):
> rust: tests: drop 'use crate' in bitmap and atomic KUnit tests
> rust: testing: add Kconfig for KUnit test
> Documentation: rust: testing: add Kconfig guidance
Acked-by: Gary Guo <gary@garyguo.net>
>
> Documentation/rust/testing.rst | 5 ++-
> init/Kconfig | 2 +
> rust/kernel/Kconfig.test | 76 ++++++++++++++++++++++++++++
> rust/kernel/alloc/allocator.rs | 1 +
> rust/kernel/alloc/kvec.rs | 1 +
> rust/kernel/bitmap.rs | 5 +--
> rust/kernel/kunit.rs | 1 +
> rust/kernel/str.rs | 1 +
> rust/kernel/sync/atomic/predefine.rs | 5 +--
> 9 files changed, 79 insertions(+), 7 deletions(-)
> create mode 100644 rust/kernel/Kconfig.test
^ permalink raw reply
* Re: [PATCH 0/2] Improve the crypto library documentation
From: Ard Biesheuvel @ 2026-04-17 13:47 UTC (permalink / raw)
To: Eric Biggers, linux-crypto
Cc: linux-kernel, Jason A . Donenfeld, Herbert Xu, linux-doc,
Jonathan Corbet, Mauro Carvalho Chehab, Randy Dunlap
In-Reply-To: <20260417065529.64925-1-ebiggers@kernel.org>
On Fri, 17 Apr 2026, at 08:55, Eric Biggers wrote:
> While the crypto library already has a lot of kerneldoc, it's not being
> included in the HTML or PDF documentation. Update Documentation/crypto/
> to include it, and also add a high-level overview of the library.
>
> I'd like to take this series via the libcrypto tree for 7.1.
>
> Eric Biggers (2):
> docs: kdoc: Expand 'at_least' when creating parameter list
> lib/crypto: docs: Add rst documentation to Documentation/crypto/
>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
I think this hits the spot wrt scope and level of detail.
^ permalink raw reply
* Re: [PATCH V10 00/10] famfs: port into fuse
From: Gregory Price @ 2026-04-17 13:30 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, Dan Williams, Joanne Koong, John Groves,
Miklos Szeredi, Bernd Schubert, John Groves, Dan J Williams,
Bernd Schubert, Alison Schofield, John Groves, Jonathan Corbet,
Shuah Khan, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya, Chen Linxuan,
James Morse, Fuad Tabba, Sean Christopherson, Shivank Garg,
Ackerley Tng, Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
In-Reply-To: <aeHrsIGagdmZJ1Fw@infradead.org>
On Fri, Apr 17, 2026 at 01:13:36AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 16, 2026 at 03:43:31PM -0700, Darrick J. Wong wrote:
>
> > ...however the strongest case (IMO) would be if (having merged famfs) we
> > then merge fuse-iomap after famfs. Then we extend the existing
> > fuse-iomap-bpf prototype to allow per-mount and per-inode iomap bpf ops.
> > That enables us to analyze thoroughly the performance characteristics of:
>
> Don't go there. I think that you two are comining up with two
> interfaces for roughly the same thing is a pretty clear indicator
> that this needs to be fully hashed out as a single interface first,
> and any kind of preliminary merging is just going to create problems.
>
We're not sure how deep this rathole goes, and John's work has been
in the rathole for a few years now. Hence the desire to hedge.
But it's obvious no decisions will be made before LSFMM - so we can
take a breath and chew on it. Maybe we'll get there in person.
~Gregory
^ permalink raw reply
* Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
From: Kiryl Shutsemau @ 2026-04-17 12:26 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Peter Xu, Lorenzo Stoakes, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Liam R . Howlett, Zi Yan,
Jonathan Corbet, Shuah Khan, Sean Christopherson, Paolo Bonzini,
linux-mm, linux-kernel, linux-doc, linux-kselftest, kvm
In-Reply-To: <4c635703-3d8d-4cfa-bb98-7f6f5fcbe547@kernel.org>
On Fri, Apr 17, 2026 at 01:43:36PM +0200, David Hildenbrand (Arm) wrote:
> On 4/16/26 22:25, Kiryl Shutsemau wrote:
> > On Thu, Apr 16, 2026 at 08:32:19PM +0200, David Hildenbrand (Arm) wrote:
> >> On 4/16/26 15:49, Kiryl Shutsemau wrote:
> >>>
> >>> Here is an updated version:
> >>>
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/log/?h=uffd/rfc-v2
> >>>
> >>> will post after -rc1 is tagged.
> >>>
> >>> I like it more. It got substantially cleaner.
> >>
> >> I don't have time to look into the details just yet, but my thinking was
> >> that
> >>
> >> a) It would avoid the zap+refault
> >
> > Yep.
> >
> >> b) We could reuse the uffd-wp PTE bit + marker to indicate/remember the
> >> protection, making it co-exist with NUMA hinting naturally.
> >>
> >> b) obviously means that we cannot use uffd-wp and uffd-rwp at the same
> >> time in the same uffd area. I guess that should be acceptable for the
> >> use cases we you should have in mind?
> >
> > I took a different path: I still use PROT_NONE PTEs, so it cannot
> > co-exist with NUMA balancing [fully], but WP + RWP should be fine. I
> > need to add a test for this.
> >
> > I didn't give up on NUMA balancing completely. task_numa_fault() is
> > called on RWP fault. So it should help scheduler decisions somewhat.
> >
> > I think an RWP user might want to use WP too.
> >
> > Do you see this trade-off as reasonable?
>
> One reason why the PTE bit was added for the WP case was to distinguish
> it from other write faults.
>
> I assume without a dedicated PTE bit your design will always suffer from
> false positive notifications.
>
> Leaving NUMA-balancing aside, a simple
> mprotect(PROT_NONE)+mprotect(PROT_READ) would already be problematic to
> distinguish both cases.
Hm. I didn't consider this case (miss some uffd lore). Will rework to
reuse existing PTE bit.
Thanks for the feedback!
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox