* Re: [PATCH] cpufreq: CPPC: add autonomous mode boot parameter support
From: Pierre Gondois @ 2026-04-10 13:47 UTC (permalink / raw)
To: Sumit Gupta
Cc: linux-tegra, linux-kernel, linux-doc, zhenglifeng1, treding,
viresh.kumar, jonathanh, vsethi, ionela.voinescu, ksitaraman,
sanjayc, zhanjie9, corbet, mochs, skhan, bbasu, rdunlap, linux-pm,
mario.limonciello, rafael
In-Reply-To: <b8debb30-67a5-4d2b-8c08-8fd287f7258e@nvidia.com>
Hello Sumit,
On 4/6/26 20:08, Sumit Gupta wrote:
> Hi Pierre,
>
> Thank you for the comments.
> Sorry for late reply as I was on vacation.
>
No worries
>
> On 24/03/26 23:48, Pierre Gondois wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Sumit,
>>
>> On 3/17/26 16:10, Sumit Gupta wrote:
>>> Add kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable CPPC
>>> autonomous performance selection on all CPUs at system startup without
>>> requiring runtime sysfs manipulation. When autonomous mode is enabled,
>>> the hardware automatically adjusts CPU performance based on workload
>>> demands using Energy Performance Preference (EPP) hints.
>>>
>>> When auto_sel_mode=1:
>>> - Configure all CPUs for autonomous operation on first init
>>> - Set EPP to performance preference (0x0)
>>> - Use HW min/max when set; otherwise program from policy limits (caps)
>>> - Clamp desired_perf to bounds before enabling autonomous mode
>>> - Hardware controls frequency instead of the OS governor
>>>
>>> The boot parameter is applied only during first policy initialization.
>>> On hotplug, skip applying it so that the user's runtime sysfs
>>> configuration is preserved.
>>>
>>> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> (Documentation)
>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>> ---
>>> Part 1 [1] of this series was applied for 7.1 and present in next.
>>> Sending this patch as reworked version of 'patch 11' from [2] based
>>> on next.
>>>
>>> [1]
>>> https://lore.kernel.org/lkml/20260206142658.72583-1-sumitg@nvidia.com/
>>> [2]
>>> https://lore.kernel.org/lkml/20251223121307.711773-1-sumitg@nvidia.com/
>>> ---
>>> .../admin-guide/kernel-parameters.txt | 13 +++
>>> drivers/cpufreq/cppc_cpufreq.c | 84
>>> +++++++++++++++++--
>>> 2 files changed, 92 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt
>>> b/Documentation/admin-guide/kernel-parameters.txt
>>> index fa6171b5fdd5..de4b4c89edfe 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -1060,6 +1060,19 @@ Kernel parameters
>>> policy to use. This governor must be
>>> registered in the
>>> kernel before the cpufreq driver probes.
>>>
>>> + cppc_cpufreq.auto_sel_mode=
>>> + [CPU_FREQ] Enable ACPI CPPC autonomous
>>> performance
>>> + selection. When enabled, hardware
>>> automatically adjusts
>>> + CPU frequency on all CPUs based on workload
>>> demands.
>>> + In Autonomous mode, Energy Performance
>>> Preference (EPP)
>>> + hints guide hardware toward performance (0x0)
>>> or energy
>>> + efficiency (0xff).
>>> + Requires ACPI CPPC autonomous selection
>>> register support.
>>> + Format: <bool>
>>> + Default: 0 (disabled)
>>> + 0: use cpufreq governors
>>> + 1: enable if supported by hardware
>>> +
>>> cpu_init_udelay=N
>>> [X86,EARLY] Delay for N microsec between
>>> assert and de-assert
>>> of APIC INIT to start processors. This delay
>>> occurs
>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>>> b/drivers/cpufreq/cppc_cpufreq.c
>>> index 5dfb109cf1f4..49c148b2a0a4 100644
>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>> @@ -28,6 +28,9 @@
>>>
>>> static struct cpufreq_driver cppc_cpufreq_driver;
>>>
>>> +/* Autonomous Selection boot parameter */
>>> +static bool auto_sel_mode;
>>> +
>>> #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
>>> static enum {
>>> FIE_UNSET = -1,
>>> @@ -708,11 +711,74 @@ static int cppc_cpufreq_cpu_init(struct
>>> cpufreq_policy *policy)
>>> policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
>>> cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
>>>
>>> - ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> - if (ret) {
>>> - pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
>>> - caps->highest_perf, cpu, ret);
>>> - goto out;
>>> + /*
>>> + * Enable autonomous mode on first init if boot param is set.
>>> + * Check last_governor to detect first init and skip if auto_sel
>>> + * is already enabled.
>>> + */
>> If the goal is to set autosel only once at the driver init,
>> shouldn't this be done in cppc_cpufreq_init() ?
>> I understand that cpu_data doesn't exist yet in
>> cppc_cpufreq_init(), but this seems more appropriate to do
>> it there IMO.
>>
>> This means the cpudata should be updated accordingly
>> in this cppc_cpufreq_cpu_init() function.
>
> In an earlier version [1], the setup was in cppc_cpufreq_init() but
> was moved to cppc_cpufreq_cpu_init() to improve per-CPU error handling.
> Keeping the setup in cppc_cpufreq_init() helps to avoid the last_governor
> check. We can warn for a CPU failing to enable and continue so other
> CPUs keep autonomous mode.
> cppc_cpufreq_cpu_init() would then just check the auto_sel state
> from register and sync policy limits from min/max_perf registers when
> autonomous mode is active.
> Please let me know your thoughts.
FWIU the auto_sel_mode module parameter allows to
configure the default auto_sel_mode when the driver is
first loaded, so there should not need to check that again
whenever cppc_cpufreq_cpu_init() is called.
Maybe Ionela saw something we didn't see ?
Also just to be sure, should it still be possible to change
the auto_sel_mode through the sysfs if the driver was
loaded with auto_sel_mode=1 ?
>
> [1]
> https://lore.kernel.org/lkml/5593d364-ca37-41c5-b33f-f7e245d6d626@nvidia.com/
>
>
>>
>>> + if (auto_sel_mode && policy->last_governor[0] == '\0' &&
>>> + !cpu_data->perf_ctrls.auto_sel) {
>>> + /* Enable CPPC - optional register, some platforms
>>> need it */
>> The documentation of the CPPC Enable Register is subject to
>> interpretation, but IIUC the field should be set to use the CPPC
>> controls, so I assume this should be set in cppc_cpufreq_init()
>> instead ?
>
> Agree that the CPPC Enable is about using the CPPC control path
> in general and not only for autonomous selection.
> Will move cppc_set_enable() into cppc_cpufreq_init() or outside the
> autonomous mode block in cppc_cpufreq_cpu_init() as per conclusion
> of previous comment.
>
>>> + ret = cppc_set_enable(cpu, true);
>>> + if (ret && ret != -EOPNOTSUPP)
>>> + pr_warn("Failed to enable CPPC for CPU%d
>>> (%d)\n", cpu, ret);
>>> +
>>> + /*
>>> + * Prefer HW min/max_perf when set; otherwise program
>>> from
>>> + * policy limits derived earlier from caps.
>>> + * Clamp desired_perf to bounds and sync policy->cur.
>>> + */
>>> + if (!cpu_data->perf_ctrls.min_perf ||
>>> !cpu_data->perf_ctrls.max_perf)
>>
>> The function doesn't seem to exist.
>
> It is newly added in [2].
> Don't need to call it if we move the setup to cppc_cpufreq_init().
Ah ok right thanks.
>
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ea3db45ae476889a1ba0ab3617e6afdeeefbda3d
>
>
>
>>
>>> + cppc_cpufreq_update_perf_limits(cpu_data, policy);
>>> +
>>> + cpu_data->perf_ctrls.desired_perf =
>>> + clamp_t(u32, cpu_data->perf_ctrls.desired_perf,
>>> + cpu_data->perf_ctrls.min_perf,
>>> + cpu_data->perf_ctrls.max_perf);
>>> +
>>> + policy->cur = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.desired_perf);
>>> +
>>
>> Maybe this should also be done in cppc_cpufreq_init()
>> if the auto_sel_mode parameter is set ?
>
> Yes.
>
>>
>>> + /* EPP is optional - some platforms may not support it */
>>> + ret = cppc_set_epp(cpu, CPPC_EPP_PERFORMANCE_PREF);
>>> + if (ret && ret != -EOPNOTSUPP)
>>> + pr_warn("Failed to set EPP for CPU%d (%d)\n",
>>> cpu, ret);
>>> + else if (!ret)
>>> + cpu_data->perf_ctrls.energy_perf =
>>> CPPC_EPP_PERFORMANCE_PREF;
>>> +
>>> + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> + if (ret) {
>>> + pr_debug("Err setting perf for autonomous mode
>>> CPU:%d ret:%d\n",
>>> + cpu, ret);
>>> + goto out;
>>> + }
>>> +
>>> + ret = cppc_set_auto_sel(cpu, true);
>>> + if (ret && ret != -EOPNOTSUPP) {
>>> + pr_warn("Failed autonomous config for CPU%d
>>> (%d)\n",
>>> + cpu, ret);
>>> + goto out;
>>> + }
>>> + if (!ret)
>>> + cpu_data->perf_ctrls.auto_sel = true;
>>> + }
>>> +
>>> + if (cpu_data->perf_ctrls.auto_sel) {
>>
>> There is a patchset ongoing which tries to remove
>> setting policy->min/max from driver initialization.
>> Indeed, these values are only temporarily valid,
>> until the governor override them.
>> It is not sure yet the patch will be accepted though.
>>
>> https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@arm.com/
>>
>
>
> You are right that policy->min/max from .init() are temporary today
> as cpufreq_set_policy() overwrites them before the governor starts.
>
> On my test platform (highest == nominal, lowest_nonlinear == lowest),
> this had no visible effect because the BIOS bounds and cpuinfo range
> end up identical. But on platforms where they differ, the governor
> would widen the range to full cpuinfo limits.
>
> I think your patch [3] fixes this by giving these the right semantic as
> initial QoS requests. With it, cpufreq_set_policy() preserves the policy
> limits set from min/max_perf registers in .init(), which can either be
> BIOS values on first boot or last user configured values before hotplug.
>
> I will update the comment in v2 to reflect QoS seeding intent.
>
> I see that the first two patches of your series [3] is applied for 7.1.
> Do you plan to send the pending patch (3/4) from [3]?
>
I need to ping Viresh to check if this is still relevant.
> [3]
> https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@arm.com/
>
>
>>
>>
>>> + /* Sync policy limits from HW when autonomous mode is
>>> active */
>>> + policy->min = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.min_perf ?:
>>> + caps->lowest_nonlinear_perf);
>>> + policy->max = cppc_perf_to_khz(caps,
>>> + cpu_data->perf_ctrls.max_perf ?:
>>> + caps->nominal_perf);
>>> + } else {
>>> + /* Normal mode: governors control frequency */
>>> + ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
>>> + if (ret) {
>>> + pr_debug("Err setting perf value:%d on CPU:%d.
>>> ret:%d\n",
>>> + caps->highest_perf, cpu, ret);
>>> + goto out;
>>> + }
>>> }
>>>
>>> cppc_cpufreq_cpu_fie_init(policy);
>>> @@ -1038,10 +1104,18 @@ static int __init cppc_cpufreq_init(void)
>>>
>>> static void __exit cppc_cpufreq_exit(void)
>>> {
>>> + unsigned int cpu;
>>> +
>>> + for_each_present_cpu(cpu)
>>> + cppc_set_auto_sel(cpu, false);
>>
>> If the firmware has a default EPP value, it means that loading
>> and the unloading the driver will reset this default EPP value.
>> Maybe the initial EPP value and/or the auto_sel value should be
>> cached somewhere and restored on exit ?
>> I don't know if this is actually an issue, this is just to signal it.
>
> The auto_sel_mode boot path programs EPP to performance preference(0),
> not the firmware’s previous value. On unload we only call
> cppc_set_auto_sel(false); we do not restore EPP, min/max perf,
> or other CPPC fields to firmware defaults.
Yes right, so loading/unloading the driver might change the
default EPP value.
>
> Thank you,
> Sumit Gupta
>
> ....
>
>
^ permalink raw reply
* Re: [PATCH net-next v2 02/14] libie: add PCI device initialization helpers to libie
From: Larysa Zaremba @ 2026-04-10 13:32 UTC (permalink / raw)
To: Paolo Abeni
Cc: Tony Nguyen, davem, kuba, edumazet, andrew+netdev, netdev,
Phani R Burra, przemyslaw.kitszel, aleksander.lobakin,
sridhar.samudrala, anjali.singhai, michal.swiatkowski,
maciej.fijalkowski, emil.s.tantilov, madhu.chittim, joshua.a.hay,
jacob.e.keller, jayaprakash.shanmugam, jiri, horms, corbet,
richardcochran, linux-doc, bhelgaas, linux-pci, Bharath R,
Samuel Salin, Aleksandr Loktionov
In-Reply-To: <2e618260-2153-4c36-be61-d2329c9da13f@redhat.com>
On Thu, Apr 09, 2026 at 10:56:26AM +0200, Paolo Abeni wrote:
> On 4/3/26 9:49 PM, Tony Nguyen wrote:
> > + mr = libie_find_mmio_region(&mmio_info->mmio_list, offset, size,
> > + bar_idx);
> > + if (mr) {
> > + pci_warn(pdev,
> > + "Mapping of BAR%u (offset=%llu, size=%llu) intersecting region (offset=%llu, size=%llu) already exists\n",
> > + bar_idx, (unsigned long long)mr->offset,
> > + (unsigned long long)mr->size,
> > + (unsigned long long)offset, (unsigned long long)size);
> > + return mr->offset <= offset &&
> > + mr->offset + mr->size >= offset + size;
>
> Sashiko says:
>
> ---
> Does returning true here without creating a new tracking object leave
> the new mapping tied to the original mapping's lifetime?
> If the driver unmaps the original region, iounmap() is called and the
> tracking object is freed. Any cached virtual address pointers to the
> sub-region would then become a use-after-free, and subsequent queries
> for the sub-region would fail.
> ---
Current users map and unmap region groups in a 'map 1-2-3 and unmap 3-2-1'
fashion, and this is not expected to change, so should be fine as-is.
>
> /P
>
^ permalink raw reply
* [GIT PULL] Chinese-docs changes for v7.1
From: Alex Shi @ 2026-04-10 13:29 UTC (permalink / raw)
To: Jonathan Corbet, linux-doc, open list, Yanteng Si, Dongliang Mu
The following changes since commit 1eab6493f525910aa7bc383a2a27b68916e3c616:
tracing: Documentation: Update histogram-design.rst for fn() handling
(2026-04-09 08:46:39 -0600)
are available in the Git repository at:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/alexs/linux.git
tags/Chinese-docs-7.1
for you to fetch changes up to 78405e7f42fa9127325c65aec9289187f67ac5ce:
docs/zh_CN: update rust/index.rst translation (2026-04-10 20:09:45 +0800)
----------------------------------------------------------------
Chinese translation docs for 7.1
This is the Chinese translation subtree for 7.1. It includes
the following changes:
- Add the rust docs translation
- Fix an inconsistent statement in dev-tools/testing-overview
- sync process/2.Process.rst with English version
Above patches are tested by 'make htmldocs'
Signed-off-by: Alex Shi <alexs@kernel.org>
----------------------------------------------------------------
Ben Guo (4):
docs/zh_CN: update rust/arch-support.rst translation
docs/zh_CN: update rust/coding-guidelines.rst translation
docs/zh_CN: update rust/quick-start.rst translation
docs/zh_CN: update rust/index.rst translation
LIU Haoyang (1):
docs/zh_CN: fix an inconsistent statement in
dev-tools/testing-overview
Song Hongyi (1):
docs/zh_CN: sync process/2.Process.rst with English version
Documentation/translations/zh_CN/dev-tools/testing-overview.rst | 2 +-
Documentation/translations/zh_CN/process/2.Process.rst | 56
++++---
Documentation/translations/zh_CN/rust/arch-support.rst | 9 +-
Documentation/translations/zh_CN/rust/coding-guidelines.rst | 262
+++++++++++++++++++++++++++++++--
Documentation/translations/zh_CN/rust/index.rst | 17 ---
Documentation/translations/zh_CN/rust/quick-start.rst | 190
++++++++++++++++++------
6 files changed, 427 insertions(+), 109 deletions(-)
^ permalink raw reply
* Re: [PATCH] docs: escape ** glob pattern in MAINTAINERS descriptions
From: Jonathan Corbet @ 2026-04-10 12:50 UTC (permalink / raw)
To: Randy Dunlap, Matteo Croce, Mauro Carvalho Chehab
Cc: linux-doc, linux-kernel, Matteo Croce
In-Reply-To: <72487cc4-b5fa-4a07-bcb5-a6ba479161e3@infradead.org>
Randy Dunlap <rdunlap@infradead.org> writes:
> Hi,
>
> On 4/9/26 3:31 PM, Matteo Croce wrote:
>> From: Matteo Croce <teknoraver@meta.com>
>>
>> Escape '**' in the MAINTAINERS descriptions section to prevent
>> reStructuredText from interpreting it as bold/strong inline markup,
>> which causes a warning when running 'make htmldocs'.
>>
>> Fixes: 420849332f9f ("get_maintainer: add ** glob pattern support")
>> Signed-off-by: Matteo Croce <teknoraver@meta.com>
>> ---
>> Documentation/sphinx/maintainers_include.py | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/sphinx/maintainers_include.py b/Documentation/sphinx/maintainers_include.py
>> index 519ad18685b2..54f34f47c9ee 100755
>> --- a/Documentation/sphinx/maintainers_include.py
>> +++ b/Documentation/sphinx/maintainers_include.py
>> @@ -89,7 +89,8 @@ class MaintainersInclude(Include):
>> output = None
>> if descriptions:
>> # Escape the escapes in preformatted text.
>> - output = "| %s" % (line.replace("\\", "\\\\"))
>> + output = "| %s" % (line.replace("\\", "\\\\")
>> + .replace("**", "\\**"))
>> # Look for and record field letter to field name mappings:
>> # R: Designated *reviewer*: FullName <address@domain>
>> m = re.search(r"\s(\S):\s", line)
>
> It's nice to eliminate one warning from 'make htmldocs', so this is good
> in that regard. However, there are still multiple problems (not Warnings)
> with '*' characters in the MAINTAINERS file:
I've mentioned this before but done nothing about it ... I really wonder
about the value of bringing in the MAINTAINERS file in the first place.
Do we think that anybody is reading it in the rendered docs?
jon
^ permalink raw reply
* Re: [PATCH v2 0/4] docs/zh_CN: update rust/ subsystem translations
From: Alex Shi @ 2026-04-10 12:11 UTC (permalink / raw)
To: Ben Guo, Alex Shi, Yanteng Si, Dongliang Mu, Jonathan Corbet
Cc: linux-doc, linux-kernel, rust-for-linux, hust-os-kernel-patches
In-Reply-To: <cover.1775786987.git.ben.guo@openatom.club>
Applied, Thanks!
On 2026/4/10 10:41, Ben Guo wrote:
> Update Chinese translations for the Rust subsystem documentation,
> syncing with the latest upstream changes.
>
> - arch-support.rst: add ARM (ARMv7) support, update RISC-V and UM notes
> - coding-guidelines.rst: add imports formatting, private item docs,
> C FFI types, and Lints sections
> - quick-start.rst: add distro-specific install instructions, update
> rustc/bindgen sections, remove cargo section
> - index.rst: remove experimental notice and genindex
>
> Changes in v2:
> - Add Reviewed-by from Gary Guo
>
> Ben Guo (4):
> docs/zh_CN: update rust/arch-support.rst translation
> docs/zh_CN: update rust/coding-guidelines.rst translation
> docs/zh_CN: update rust/quick-start.rst translation
> docs/zh_CN: update rust/index.rst translation
>
> .../translations/zh_CN/rust/arch-support.rst | 9 +-
> .../zh_CN/rust/coding-guidelines.rst | 262 +++++++++++++++++-
> .../translations/zh_CN/rust/index.rst | 17 --
> .../translations/zh_CN/rust/quick-start.rst | 190 ++++++++++---
> 4 files changed, 401 insertions(+), 77 deletions(-)
>
> -- 2.53.0
^ permalink raw reply
* Re: [PATCH 3/6] hugetlb: make hugetlb_fault_mutex_hash() take PAGE_SIZE index
From: Usama Arif @ 2026-04-10 11:24 UTC (permalink / raw)
To: Jane Chu
Cc: Usama Arif, akpm, david, muchun.song, osalvador, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, skhan, hughd,
baolin.wang, peterx, linux-mm, linux-doc, linux-kernel
In-Reply-To: <20260409234158.837786-4-jane.chu@oracle.com>
On Thu, 9 Apr 2026 17:41:54 -0600 Jane Chu <jane.chu@oracle.com> wrote:
> hugetlb_fault_mutex_hash() is used to serialize faults and page cache
> operations on the same hugetlb file offset. The helper currently expects
> its index argument in hugetlb page granularity, so callers have to
> open-code conversions from the PAGE_SIZE-based indices commonly used
> in the rest of MM helpers.
>
> Change hugetlb_fault_mutex_hash() to take a PAGE_SIZE-based index
> instead, and perform the hugetlb-granularity conversion inside the helper.
> Update all callers accordingly.
>
> This makes the helper interface consistent with filemap_get_folio(),
> and linear_page_index(), while preserving the same lock selection for
> a given hugetlb file offset.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
> fs/hugetlbfs/inode.c | 19 ++++++++++---------
> mm/hugetlb.c | 28 +++++++++++++++++++---------
> mm/memfd.c | 11 ++++++-----
> mm/userfaultfd.c | 7 +++----
> 4 files changed, 38 insertions(+), 27 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index cf79fb830377..e24e9bf54e14 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -575,7 +575,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> struct address_space *mapping = &inode->i_data;
> const pgoff_t end = lend >> PAGE_SHIFT;
> struct folio_batch fbatch;
> - pgoff_t next, index;
> + pgoff_t next, idx;
> int i, freed = 0;
> bool truncate_op = (lend == LLONG_MAX);
>
> @@ -586,15 +586,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> struct folio *folio = fbatch.folios[i];
> u32 hash = 0;
>
> - index = folio->index >> huge_page_order(h);
> - hash = hugetlb_fault_mutex_hash(mapping, index);
> + hash = hugetlb_fault_mutex_hash(mapping, folio->index);
> mutex_lock(&hugetlb_fault_mutex_table[hash]);
>
> /*
> * Remove folio that was part of folio_batch.
> */
> + idx = folio->index >> huge_page_order(h);
> remove_inode_single_folio(h, inode, mapping, folio,
> - index, truncate_op);
> + idx, truncate_op);
> freed++;
>
> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> @@ -734,7 +734,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> struct mm_struct *mm = current->mm;
> loff_t hpage_size = huge_page_size(h);
> unsigned long hpage_shift = huge_page_shift(h);
> - pgoff_t start, index, end;
> + pgoff_t start, end, idx, index;
> int error;
> u32 hash;
>
> @@ -774,7 +774,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> vm_flags_init(&pseudo_vma, VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
> pseudo_vma.vm_file = file;
>
> - for (index = start; index < end; index++) {
> + for (idx = start; idx < end; idx++) {
> /*
> * This is supposed to be the vaddr where the page is being
> * faulted in, but we have no vaddr here.
> @@ -794,14 +794,15 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> }
>
> /* addr is the offset within the file (zero based) */
> - addr = index * hpage_size;
> + addr = idx * hpage_size;
>
> /* mutex taken here, fault path and hole punch */
> + index = idx << huge_page_order(h);
> hash = hugetlb_fault_mutex_hash(mapping, index);
> mutex_lock(&hugetlb_fault_mutex_table[hash]);
>
> /* See if already present in mapping to avoid alloc/free */
> - folio = filemap_get_folio(mapping, index << huge_page_order(h));
> + folio = filemap_get_folio(mapping, index);
> if (!IS_ERR(folio)) {
> folio_put(folio);
> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> @@ -824,7 +825,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> }
> folio_zero_user(folio, addr);
> __folio_mark_uptodate(folio);
> - error = hugetlb_add_to_page_cache(folio, mapping, index);
> + error = hugetlb_add_to_page_cache(folio, mapping, idx);
> if (unlikely(error)) {
> restore_reserve_on_error(h, &pseudo_vma, addr, folio);
> folio_put(folio);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 38b39eaf46cc..9d5ae1f87850 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5515,7 +5515,7 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
> */
> if (cow_from_owner) {
> struct address_space *mapping = vma->vm_file->f_mapping;
> - pgoff_t idx;
> + pgoff_t index;
> u32 hash;
>
> folio_put(old_folio);
> @@ -5528,8 +5528,9 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
> *
> * Reacquire both after unmap operation.
> */
> - idx = vma_hugecache_offset(h, vma, vmf->address);
> - hash = hugetlb_fault_mutex_hash(mapping, idx);
> + index = linear_page_index(vma, vmf->address);
> + hash = hugetlb_fault_mutex_hash(mapping, index);
> +
> hugetlb_vma_unlock_read(vma);
> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>
> @@ -5664,6 +5665,10 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_fault *vmf,
> unsigned long reason)
> {
> u32 hash;
> + pgoff_t index;
> +
> + index = linear_page_index((const struct vm_area_struct *)vmf, vmf->address);
This is supposed to be linear_page_index(vmf->vma, vmf->address), right?
^ permalink raw reply
* Re: [PATCH] docs/zh_CN: add module-signing Chinese translation
From: Alex Shi @ 2026-04-10 11:21 UTC (permalink / raw)
To: Yan Zhu, alexs, si.yanteng, corbet; +Cc: dzm91, skhan, linux-doc, linux-kernel
In-Reply-To: <tencent_1BF5E860CC01A1E23DC405DD92314BE16E05@qq.com>
On 2026/4/1 23:40, Yan Zhu wrote:
> Translate .../admin-guide/module-signing.rst into Chinese.
>
> Update the translation through commit 0ad9a71933e7
> ("modsign: Enable ML-DSA module signing")
>
> Signed-off-by: Yan Zhu<zhuyan2015@qq.com>
> ---
> .../zh_CN/admin-guide/module-signing.rst | 242 ++++++++++++++++++
> 1 file changed, 242 insertions(+)
> create mode 100644 Documentation/translations/zh_CN/admin-guide/module-signing.rst
>
> diff --git a/Documentation/translations/zh_CN/admin-guide/module-signing.rst b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
> new file mode 100644
> index 000000000000..b8c209dd229d
> --- /dev/null
> +++ b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
> @@ -0,0 +1,242 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: ../disclaimer-zh_CN.rst
> +
> +:Original: Documentation/admin-guide/module-signing.rst
> +:翻译:
> + 朱岩 Yan Zhu<zhuyan2015@qq.com>
> +
> +
> +==========================
> +内核模块签名机制
> +==========================
> +
> +.. 目录
> +..
> +.. - 概述
> +.. - 配置模块签名
> +.. - 生成签名密钥
> +.. - 内核中的公钥
> +.. - 模块手动签名
> +.. - 已签名模块和剥离
> +.. - 加载已签名模块
> +.. - 无效签名和未签名模块
> +.. - 管理/保护私钥
> +
> +
> +概述
> +====
> +
> +内核模块签名机制在安装过程中对模块进行加密签名,然后在加载模块时检查签名。
> +这通过禁止加载未签名的模块或使用无效密钥签名的模块来提高内核安全性。
> +模块签名通过使恶意模块更难加载到内核中来增加安全性。
> +模块签名检查在内核中完成,因此不需要受信任的用户空间位。
> +
> +此机制使用 X.509 ITU-T 标准证书对涉及的公钥进行编码。
> +签名本身不以任何工业标准类型编码。
> +内置机制目前仅支持 RSA、NIST P-384 ECDSA 和 NIST FIPS-204 ML-DSA 公钥签名标准(尽管它是可插拔的并允许使用其他标准)。
This line is too long, and not align with other lines. you need to
follow the coding style in documents.
> +对于 RSA 和 ECDSA,可以使用的可能的哈希算法是大小为 256、384 和 512 的 SHA-2 和 SHA-3(算法由签名中的数据选择);
> +ML-DSA会自行进行哈希运算,但允许与SHA512哈希算法结合用于签名属性。
> +
> +配置模块签名
> +============
> +
> +通过进入内核配置的 :menuselection:`Enable Loadable Module Support` 菜单并打开以下选项来启用模块签名机制::
> +
> + CONFIG_MODULE_SIG "Module signature verification"
> +
> +这有多个可用选项:
> +
> + (1) :menuselection:`Require modules to be validly signed`
> + (``CONFIG_MODULE_SIG_FORCE``)
> +
> + 这指定了内核应如何处理其密钥未知或未签名的模块。
> +
> + 如果关闭(即"宽松模式"),则允许使用不可用密钥和未签名的模块,
> + 但内核将被标记为受污染,并且相关模块将被标记为受污染,显示字符'E'。
> +
> + 如果打开(即"限制模式"),只有具有有效签名且可由内核拥有的公钥验证的模块才会被加载。
> + 所有其他模块将生成错误。
> +
> + 无论此处的设置如何,如果模块的签名块无法解析,它将被直接拒绝。
> +
> +
> + (2) :menuselection:`Automatically sign all modules`
> + (``CONFIG_MODULE_SIG_ALL``)
> +
> + 如果打开此选项,则在构建的 modules_install 阶段期间将自动签名模块。
> + 如果关闭,则必须使用以下命令手动签名模块::
> +
> + scripts/sign-file
> +
> +
> + (3) :menuselection:`Which hash algorithm should modules be signed with?`
> +
> + 这提供了安装阶段将用于签名模块的哈希算法选择:
> +
> + =============================== ==========================================
> + ``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
> + ``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
> + ``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
> + ``CONFIG_MODULE_SIG_SHA3_256`` :menuselection:`Sign modules with SHA3-256`
> + ``CONFIG_MODULE_SIG_SHA3_384`` :menuselection:`Sign modules with SHA3-384`
> + ``CONFIG_MODULE_SIG_SHA3_512`` :menuselection:`Sign modules with SHA3-512`
> + =============================== ==========================================
Got errors here:
Applying: docs/zh_CN: add module-signing Chinese translation
/home/alexshi/linuxdoc/.git/rebase-apply/patch:87: indent with spaces.
===============================
==========================================
/home/alexshi/linuxdoc/.git/rebase-apply/patch:94: indent with spaces.
===============================
==========================================
warning: 2 lines add whitespace errors.
Thanks
Alex
> +
> + 此处选择的算法也将被构建到内核中(而不是作为模块),
> + 以便使用该算法签名的模块可以在不导致循环依赖的情况下检查其签名。
> +
> +
> + (4) :menuselection:`File name or PKCS#11 URI of module signing key`
> + (``CONFIG_MODULE_SIG_KEY``)
> +
> + 将此选项设置为除默认值 ``certs/signing_key.pem`` 之外的其他值将禁用签名密钥的自动生成,
> + 并允许使用您选择的密钥对内核模块进行签名。
> + 提供的字符串应标识包含私钥及其对应的 PEM 格式 X.509 证书的文件,
> + 或者在 OpenSSL ENGINE_pkcs11 功能正常的系统上,使用 RFC7512 定义的 PKCS#11 URI。
> + 在后一种情况下,PKCS#11 URI 应引用证书和私钥。
> +
> + 如果包含私钥的 PEM 文件已加密,或者 PKCS#11 令牌需要 PIN,
> + 可以通过 ``KBUILD_SIGN_PIN`` 变量在构建时提供。
> +
> +
> + (5) :menuselection:`Additional X.509 keys for default system keyring`
> + (``CONFIG_SYSTEM_TRUSTED_KEYS``)
> +
> + 此选项可设置为包含附加证书的 PEM 编码文件的文件名,
> + 这些证书将默认包含在系统密钥环中。
> +
> +请注意,启用模块签名会为内核构建过程添加对执行签名工具的 OpenSSL 开发包的依赖。
> +
> +
> +生成签名密钥
> +============
> +
> +生成和检查签名需要加密密钥对。私钥用于生成签名,相应的公钥用于检查签名。
> +私钥仅在构建期间需要,之后可以删除或安全存储。
> +公钥被构建到内核中,以便在加载模块时可以使用它来检查签名。
> +
> +在正常情况下,当 ``CONFIG_MODULE_SIG_KEY`` 保持默认值时,
> +如果文件中不存在密钥对,内核构建将使用 openssl 自动生成新的密钥对::
> +
> + certs/signing_key.pem
> +
> +在构建 vmlinux 期间(公钥需要构建到 vmlinux 中)使用参数::
> +
> + certs/x509.genkey
> +
> +文件(如果尚不存在也会生成)。
> +
> +可以在 RSA(``MODULE_SIG_KEY_TYPE_RSA``)、ECDSA(``MODULE_SIG_KEY_TYPE_ECDSA``)
> +和 ML-DSA(``MODULE_SIG_KEY_TYPE_MLDSA_*``)之间选择生成 RSA 4k、NIST P-384 密钥对或 ML-DSA 44、65 或 87 密钥对。
> +
> +强烈建议您提供自己的 x509.genkey 文件。
> +
> +最值得注意的是,在 x509.genkey 文件中,req_distinguished_name 部分应从默认值更改::
> +
> + [ req_distinguished_name ]
> + #O = Unspecified company
> + CN = Build time autogenerated kernel key
> + #emailAddress =unspecified.user@unspecified.company
> +
> +生成的 RSA 密钥大小也可以通过以下方式设置::
> +
> + [ req ]
> + default_bits = 4096
> +
> +也可以使用位于 Linux 内核源代码树根节点中的 x509.genkey 密钥生成配置文件和 openssl 命令手动生成公钥/私钥文件。
> +以下是生成公钥/私钥文件的示例::
> +
> + openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
> + -config x509.genkey -outform PEM -out kernel_key.pem \
> + -keyout kernel_key.pem
> +
> +然后可以将生成的 kernel_key.pem 文件的完整路径名指定在 ``CONFIG_MODULE_SIG_KEY`` 选项中,
> +并且将使用其中的证书和密钥而不是自动生成的密钥对。
> +
> +
> +内核中的公钥
> +============
> +
> +内核包含一个可由 root 查看的公钥环。它们在名为 ".builtin_trusted_keys" 的密钥环中,
> +可以通过以下方式查看::
> +
> + [root@deneb ~]# cat /proc/keys
> + ...
> + 223c7853 I------ 1 perm 1f030000 0 0 keyring .builtin_trusted_keys: 1
> + 302d2d52 I------ 1 perm 1f010000 0 0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 []
> +
> +除了专门为模块签名生成的公钥外,还可以在 ``CONFIG_SYSTEM_TRUSTED_KEYS`` 配置选项引用的 PEM 编码文件中提供其他受信任的证书。
> +
> +此外,架构代码可以从硬件存储中获取公钥并将其添加(例如从 UEFI 密钥数据库)。
> +
> +最后,可以通过以下方式添加其他公钥::
> +
> + keyctl padd asymmetric "" [.builtin_trusted_keys-ID] <[key-file]
> +
> +例如::
> +
> + keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
> +
> +但是,请注意,内核只允许将由已驻留在 ``.builtin_trusted_keys`` 中的密钥有效签名的密钥添加到 ``.builtin_trusted_keys``。
> +
> +模块手动签名
> +============
> +
> +要手动对模块进行签名,请使用 Linux 内核源代码树中可用的 scripts/sign-file 工具。
> +该脚本需要 4 个参数:
> +
> + 1. 哈希算法(例如,sha256)
> + 2. 私钥文件名或 PKCS#11 URI
> + 3. 公钥文件名
> + 4. 要签名的内核模块
> +
> +以下是签名内核模块的示例::
> +
> + scripts/sign-file sha512 kernel-signkey.priv \
> + kernel-signkey.x509 module.ko
> +
> +使用的哈希算法不必与配置的算法匹配,但如果不同,
> +应确保哈希算法要么内置在内核中,要么可以在不需要自身的情况下加载。
> +
> +如果私钥需要密码或 PIN,可以在 $KBUILD_SIGN_PIN 环境变量中提供。
> +
> +
> +已签名模块和剥离
> +================
> +
> +已签名模块在末尾简单地附加了数字签名。模块文件末尾的字符串
> +``~Module signature appended~.`` 确认签名存在,但不能确认签名有效!
> +
> +已签名模块是脆弱的,因为签名在定义的ELF容器之外。
> +因此,一旦计算并附加签名,就不得剥离它们。
> +请注意,整个模块都是签名的有效载荷,包括签名时存在的任何和所有调试信息。
> +
> +
> +加载已签名模块
> +==============
> +
> +模块通过 insmod、modprobe、``init_module()`` 或 ``finit_module()`` 加载,
> +与未签名模块完全一样,因为在用户空间中不进行任何处理。
> +所有签名检查都在内核内完成。
> +
> +
> +无效签名和未签名模块
> +====================
> +
> +如果启用了 ``CONFIG_MODULE_SIG_FORCE`` 或在内核启动命令提供了 module.sig_enforce=1,
> +内核将仅加载具有有效签名且具有公钥的模块。
> +否则,它还将加载未签名的模块。
> +任何具有不匹配签名的模块将不被允许加载。
> +
> +任何具有不可解析签名的模块将被拒绝。
> +
> +
> +管理/保护私钥
> +==============
> +
> +由于私钥用于签名模块,病毒和恶意软件可以使用私钥签名模块并危害操作系统。
> +私钥必须被销毁或移动到安全位置,而不是保存在内核源代码树的根节点中。
> +
> +如果使用相同的私钥为多个内核配置签名模块,
> +必须确保模块版本信息足以防止将模块加载到不同的内核中。
> +要么设置 ``CONFIG_MODVERSIONS=y``,要么通过更改 ``EXTRAVERSION`` 或 ``CONFIG_LOCALVERSION`` 确保每个配置具有不同的内核发布字符串。
> -- 2.43.0
>
^ permalink raw reply
* Re: maintainer profiles
From: Mauro Carvalho Chehab @ 2026-04-10 8:12 UTC (permalink / raw)
To: Randy Dunlap
Cc: Linux Documentation, Linux Kernel Mailing List, Jonathan Corbet,
Linux Kernel Workflows
In-Reply-To: <b7775383-da94-4098-8af9-2f672c4f1a71@infradead.org>
On Thu, 9 Apr 2026 17:18:39 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:
> Hi,
>
> Is there supposed to be a difference (or distinction) in the contents of
>
> Documentation/process/maintainer-handbooks.rst
> and
> Documentation/maintainer/maintainer-entry-profile.rst
> ?
>
> Can they be combined into one location?
Heh, from the 5 entries at maintainer-handbooks.rst:
maintainer-netdev
maintainer-soc
maintainer-soc-clean-dts
maintainer-tip
maintainer-kvm-x86
we have 3 of them already there at maintainer-entry-profile.rst:
$ grep process/ Documentation/maintainer/maintainer-entry-profile.rst
../process/maintainer-soc
../process/maintainer-soc-clean-dts
../process/maintainer-netdev
It sounds to me that moving maintainer-tip and maintainer-kvm-x86
to maintainer-entry-profile.rst would be enough to drop
maintainer-handbooks.rst, keeping them consolidated on a single
place.
Thanks,
Mauro
^ permalink raw reply
* [PATCH v2] Documentation: Refactored watchdog old doc
From: Sunny Patel @ 2026-04-10 7:28 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Wim Van Sebroeck, Guenter Roeck, Shuah Khan, linux-watchdog,
linux-doc, linux-kernel, Sunny Patel
In-Reply-To: <132f7e64-4fc6-4274-a04e-e53f0b957665@roeck-us.net>
Good Point. So again revisited the watchdog core
api and list out the deprecated one and marked
as deprecated in doc and also mentioned it just
for legacy driver and not for newer one.
As someof the legacy driver still have reference
to old api so just marked as deprecated in doc.
Also checked with other watchdog related api
which are deprecated in driver but still present
in doc but didn't find any.
---
Revisited old doc of watchdog and did some cleanup.
Also added support for new api in doc. Which is
WDIOF_MAGICCLOSE and WDIOF_PRETIMEOUT.
Reierate the core api and Mark WDIOC_GETTEMP and
WDIO_TEMPPANIC as deprecated.Both are absent from
the watchdog core and only present in legacy driver.
Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
---
Documentation/watchdog/watchdog-api.rst | 59 +++++++++++++++++++++----
1 file changed, 51 insertions(+), 8 deletions(-)
diff --git a/Documentation/watchdog/watchdog-api.rst b/Documentation/watchdog/watchdog-api.rst
index 78e228c272cf..e11575db93e6 100644
--- a/Documentation/watchdog/watchdog-api.rst
+++ b/Documentation/watchdog/watchdog-api.rst
@@ -2,7 +2,7 @@
The Linux Watchdog driver API
=============================
-Last reviewed: 10/05/2007
+Last reviewed: 04/08/2026
@@ -106,11 +106,10 @@ the requested one due to limitation of the hardware::
This example might actually print "The timeout was set to 60 seconds"
if the device has a granularity of minutes for its timeout.
-Starting with the Linux 2.4.18 kernel, it is possible to query the
-current timeout using the GETTIMEOUT ioctl::
+It is also possible to get the current timeout with the GETTIMEOUT ioctl::
ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
- printf("The timeout was is %d seconds\n", timeout);
+ printf("The timeout is %d seconds\n", timeout);
Pretimeouts
===========
@@ -133,7 +132,7 @@ seconds. Setting a pretimeout to zero disables it.
There is also a get function for getting the pretimeout::
ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
- printf("The pretimeout was is %d seconds\n", timeout);
+ printf("The pretimeout is %d seconds\n", timeout);
Not all watchdog drivers will support a pretimeout.
@@ -145,7 +144,7 @@ before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
that returns the number of seconds before reboot::
ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
- printf("The timeout was is %d seconds\n", timeleft);
+ printf("The timeout is %d seconds\n", timeleft);
Environmental monitoring
========================
@@ -227,12 +226,33 @@ The watchdog saw a keepalive ping since it was last queried.
WDIOF_SETTIMEOUT Can set/get the timeout
================ =======================
-The watchdog can do pretimeouts.
+The watchdog supports timeout set/get via the WDIOC_SETTIMEOUT and
+WDIOC_GETTIMEOUT ioctls.
================ ================================
WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
================ ================================
+The watchdog supports a pretimeout, a warning interrupt that fires before
+the actual reboot tiemout. USE WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT
+to set/get the pretimeout.
+
+ ================ ================================
+ WDIOF_MAGICCLOSE Supports magic close char
+ ================ ================================
+
+The driver supports the Magic Close feature, The watchdog is only disabled
+if the characted 'V' is written to /dev/watchdog before the file descriptor
+is closed. Without this, closing the device disables the watchdog
+unconditionally.
+
+ ================ ================================
+ WDIOF_ALARMONLY Not a reboot watchdog
+ ================ ================================
+
+The watchdog will not reboot the system when it expires. Instead it
+triggers a management or other external alarm. Userspace should not
+rely on a system reboot occurring.
For those drivers that return any bits set in the option field, the
GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
@@ -254,6 +274,11 @@ returned value is the temperature in degrees Fahrenheit::
int temperature;
ioctl(fd, WDIOC_GETTEMP, &temperature);
+.. deprecated::
+ ``WDIOC_GETTEMP`` is not implemented by the watchdog core. It is only
+ supported by a small number of legacy drivers. New Drivers should not
+ implement it.
+
Finally the SETOPTIONS ioctl can be used to control some aspects of
the cards operation::
@@ -268,4 +293,22 @@ The following options are available:
WDIOS_TEMPPANIC Kernel panic on temperature trip
================= ================================
-[FIXME -- better explanations]
+``WDIOS_DISABLECARD`` stops the watchdog timer. The driver will cease
+pinging the hardware watchdog, allowing a controlled shutdown without
+a forced reboot. This is equivalent to the watchdog being disarmed.
+
+``WDIOS_ENABLECARD`` starts the watchdog timer. if the watchdog was
+previously stopped via ``WDIOS_DISABLECARD``,this will re-enable it. The
+hardware watchdog will begin counting down from the configured timeout.
+
+``WDIOS_TEMPPANIC`` enables temperature-based kernel panic. When set,
+the driver will call ``panic()`` (or ``kernel_power_off()`` on some
+drivers) if the hardware temperature sensor exceeds its threshold,
+rather than only setting the ``WDIOF_OVERHEAT`` status bit. Support
+for this option is driver-specific, not all watchdog drivers implement
+temperature monitoring.
+
+.. deprecated::
+ ``WDIOS_TEMPPANIC`` is not implemented by the watchdog core and is only
+ present in a small number if legacy drivers. New Drivers should not
+ implement it.
\ No newline at end of file
--
2.43.0
^ permalink raw reply related
* [PATCH v2] Documentation: Refactored watchdog old doc
From: Sunny Patel @ 2026-04-10 7:15 UTC (permalink / raw)
To: Jonathan Corbet
Cc: Wim Van Sebroeck, Guenter Roeck, Shuah Khan, linux-watchdog,
linux-doc, linux-kernel, Sunny Patel
Good Point. So again revisited the watchdog core
api and list out the deprecated one and marked
as deprecated in doc and also mentioned it just
for legacy driver and not for newer one.
As someof the legacy driver still have reference
to old api so just marked as deprecated in doc.
Also checked with other watchdog related api
which are deprecated in driver but still present
in doc but didn't find any.
---
Revisited old doc of watchdog and did some cleanup.
Also added support for new api in doc. Which is
WDIOF_MAGICCLOSE and WDIOF_PRETIMEOUT.
Reierate the core api and Mark WDIOC_GETTEMP and
WDIO_TEMPPANIC as deprecated.Both are absent from
the watchdog core and only present in legacy driver.
Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
---
Documentation/watchdog/watchdog-api.rst | 59 +++++++++++++++++++++----
1 file changed, 51 insertions(+), 8 deletions(-)
diff --git a/Documentation/watchdog/watchdog-api.rst b/Documentation/watchdog/watchdog-api.rst
index 78e228c272cf..e11575db93e6 100644
--- a/Documentation/watchdog/watchdog-api.rst
+++ b/Documentation/watchdog/watchdog-api.rst
@@ -2,7 +2,7 @@
The Linux Watchdog driver API
=============================
-Last reviewed: 10/05/2007
+Last reviewed: 04/08/2026
@@ -106,11 +106,10 @@ the requested one due to limitation of the hardware::
This example might actually print "The timeout was set to 60 seconds"
if the device has a granularity of minutes for its timeout.
-Starting with the Linux 2.4.18 kernel, it is possible to query the
-current timeout using the GETTIMEOUT ioctl::
+It is also possible to get the current timeout with the GETTIMEOUT ioctl::
ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
- printf("The timeout was is %d seconds\n", timeout);
+ printf("The timeout is %d seconds\n", timeout);
Pretimeouts
===========
@@ -133,7 +132,7 @@ seconds. Setting a pretimeout to zero disables it.
There is also a get function for getting the pretimeout::
ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
- printf("The pretimeout was is %d seconds\n", timeout);
+ printf("The pretimeout is %d seconds\n", timeout);
Not all watchdog drivers will support a pretimeout.
@@ -145,7 +144,7 @@ before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
that returns the number of seconds before reboot::
ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
- printf("The timeout was is %d seconds\n", timeleft);
+ printf("The timeout is %d seconds\n", timeleft);
Environmental monitoring
========================
@@ -227,12 +226,33 @@ The watchdog saw a keepalive ping since it was last queried.
WDIOF_SETTIMEOUT Can set/get the timeout
================ =======================
-The watchdog can do pretimeouts.
+The watchdog supports timeout set/get via the WDIOC_SETTIMEOUT and
+WDIOC_GETTIMEOUT ioctls.
================ ================================
WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
================ ================================
+The watchdog supports a pretimeout, a warning interrupt that fires before
+the actual reboot tiemout. USE WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT
+to set/get the pretimeout.
+
+ ================ ================================
+ WDIOF_MAGICCLOSE Supports magic close char
+ ================ ================================
+
+The driver supports the Magic Close feature, The watchdog is only disabled
+if the characted 'V' is written to /dev/watchdog before the file descriptor
+is closed. Without this, closing the device disables the watchdog
+unconditionally.
+
+ ================ ================================
+ WDIOF_ALARMONLY Not a reboot watchdog
+ ================ ================================
+
+The watchdog will not reboot the system when it expires. Instead it
+triggers a management or other external alarm. Userspace should not
+rely on a system reboot occurring.
For those drivers that return any bits set in the option field, the
GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
@@ -254,6 +274,11 @@ returned value is the temperature in degrees Fahrenheit::
int temperature;
ioctl(fd, WDIOC_GETTEMP, &temperature);
+.. deprecated::
+ ``WDIOC_GETTEMP`` is not implemented by the watchdog core. It is only
+ supported by a small number of legacy drivers. New Drivers should not
+ implement it.
+
Finally the SETOPTIONS ioctl can be used to control some aspects of
the cards operation::
@@ -268,4 +293,22 @@ The following options are available:
WDIOS_TEMPPANIC Kernel panic on temperature trip
================= ================================
-[FIXME -- better explanations]
+``WDIOS_DISABLECARD`` stops the watchdog timer. The driver will cease
+pinging the hardware watchdog, allowing a controlled shutdown without
+a forced reboot. This is equivalent to the watchdog being disarmed.
+
+``WDIOS_ENABLECARD`` starts the watchdog timer. if the watchdog was
+previously stopped via ``WDIOS_DISABLECARD``,this will re-enable it. The
+hardware watchdog will begin counting down from the configured timeout.
+
+``WDIOS_TEMPPANIC`` enables temperature-based kernel panic. When set,
+the driver will call ``panic()`` (or ``kernel_power_off()`` on some
+drivers) if the hardware temperature sensor exceeds its threshold,
+rather than only setting the ``WDIOF_OVERHEAT`` status bit. Support
+for this option is driver-specific, not all watchdog drivers implement
+temperature monitoring.
+
+.. deprecated::
+ ``WDIOS_TEMPPANIC`` is not implemented by the watchdog core and is only
+ present in a small number if legacy drivers. New Drivers should not
+ implement it.
\ No newline at end of file
--
2.43.0
^ permalink raw reply related
* [syzbot ci] Re: hugetlb: normalize exported interfaces to use base-page indices
From: syzbot ci @ 2026-04-10 6:45 UTC (permalink / raw)
To: akpm, baolin.wang, corbet, david, hughd, jane.chu, liam.howlett,
linux-doc, linux-kernel, linux-mm, lorenzo.stoakes, mhocko,
muchun.song, osalvador, peterx, rppt, skhan, surenb, vbabka
Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260409234158.837786-1-jane.chu@oracle.com>
syzbot ci has tested the following series
[v1] hugetlb: normalize exported interfaces to use base-page indices
https://lore.kernel.org/all/20260409234158.837786-1-jane.chu@oracle.com
* [PATCH 1/6] hugetlb: open-code hugetlb folio lookup index conversion
* [PATCH 2/6] hugetlb: remove the hugetlb_linear_page_index() helper
* [PATCH 3/6] hugetlb: make hugetlb_fault_mutex_hash() take PAGE_SIZE index
* [PATCH 4/6] hugetlb: drop vma_hugecache_offset() in favor of linear_page_index()
* [PATCH 5/6] hugetlb: make hugetlb_add_to_page_cache() use PAGE_SIZE-based index
* [PATCH 6/6] hugetlb: pass hugetlb reservation ranges in base-page indices
and found the following issue:
WARNING: bad unlock balance in hugetlb_no_page
Full report is available here:
https://ci.syzbot.org/series/95c5ba82-0135-4026-b7c7-b0819e1ca4d6
***
WARNING: bad unlock balance in hugetlb_no_page
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: 06a6cfb92448a97ef429a7fbd395a20a9d388acc
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/cefe8576-3c99-42d3-9b51-1e70d62a64a7/config
syz repro: https://ci.syzbot.org/findings/3a14cc12-14a8-4fac-9614-ae7ae2555e58/syz_repro
=====================================
WARNING: bad unlock balance detected!
syzkaller #0 Not tainted
-------------------------------------
syz.0.17/5971 is trying to release lock (&hugetlb_fault_mutex_table[i]) at:
[<ffffffff8229b876>] hugetlb_handle_userfault mm/hugetlb.c:5686 [inline]
[<ffffffff8229b876>] hugetlb_no_page+0x1986/0x1da0 mm/hugetlb.c:5770
but there are no more locks to release!
other info that might help us debug this:
2 locks held by syz.0.17/5971:
#0: ffff88816b85fb88 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0x1d1/0x500 mm/mmap_lock.c:310
#1: ffff88816079e338 (&hugetlb_fault_mutex_table[i]){+.+.}-{4:4}, at: hugetlb_fault+0x317/0x1440 mm/hugetlb.c:5991
stack backtrace:
CPU: 0 UID: 0 PID: 5971 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_unlock_imbalance_bug+0xdc/0xf0 kernel/locking/lockdep.c:5298
__lock_release kernel/locking/lockdep.c:5537 [inline]
lock_release+0x248/0x3d0 kernel/locking/lockdep.c:5889
__mutex_unlock_slowpath+0xd3/0x7d0 kernel/locking/mutex.c:938
hugetlb_handle_userfault mm/hugetlb.c:5686 [inline]
hugetlb_no_page+0x1986/0x1da0 mm/hugetlb.c:5770
hugetlb_fault+0x67f/0x1440 mm/hugetlb.c:-1
handle_mm_fault+0x2007/0x3170 mm/memory.c:6716
do_user_addr_fault+0xa73/0x1340 arch/x86/mm/fault.c:1334
handle_page_fault arch/x86/mm/fault.c:1474 [inline]
exc_page_fault+0x6a/0xc0 arch/x86/mm/fault.c:1527
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
RIP: 0033:0x7fa742251964
Code: 41 89 00 31 c0 c3 b9 40 00 00 00 bf 40 00 00 00 eb bc 0f 1f 40 00 48 89 7c 24 f8 48 89 74 24 f0 48 8b 7c 24 f8 4c 8b 44 24 f0 <8b> 4f 50 8b 47 58 4c 01 c1 41 8b 34 00 8b 11 21 d6 89 f0 8d 72 01
RSP: 002b:00007fa7431fd018 EFLAGS: 00010212
RAX: 00007fa742251950 RBX: 00007fa742615fa0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000400000
RBP: 00007fa742432c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000200000400000 R11: 0000000000000000 R12: 0000000000000000
R13: 00007fa742616038 R14: 00007fa742615fa0 R15: 00007ffe952c6908
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply
* RE: [PATCH v4 17/27] mtd: spi-nor: debugfs: Add locking support
From: Takahiro.Kuwano @ 2026-04-10 4:39 UTC (permalink / raw)
To: miquel.raynal, pratyush, mwalle, richard, vigneshr, corbet
Cc: sean.anderson, thomas.petazzoni, STLin2, linux-mtd, linux-kernel,
linux-doc
In-Reply-To: <20260403-winbond-v6-18-rc1-spi-nor-swp-v4-17-833dab5e7288@bootlin.com>
> The ioctl output may be counter intuitive in some cases. Asking for a
> "locked status" over a region that is only partially locked will return
> "unlocked" whereas in practice maybe the biggest part is actually
> locked.
>
> Knowing what is the real software locking state through debugfs would be
> very convenient for development/debugging purposes, hence this proposal
> for adding an extra block at the end of the file: a "locked sectors"
> array which lists every section, if it is locked or not, showing both
> the address ranges and the sizes in numbers of blocks.
>
> Here is an example of output, what is after the "sector map" is new.
>
> $ cat /sys/kernel/debug/spi-nor/spi0.0/params
> name (null)
> id ef a0 20 00 00 00
> size 64.0 MiB
> write size 1
> page size 256
> address nbytes 4
> flags HAS_SR_TB | 4B_OPCODES | HAS_4BAIT | HAS_LOCK | HAS_16BIT_SR | HAS_SR_TB_BIT6 | HAS_4BIT_BP |
> SOFT_RESET | NO_WP
>
> opcodes
> read 0xec
> dummy cycles 6
> erase 0xdc
> program 0x34
> 8D extension none
>
> protocols
> read 1S-4S-4S
> write 1S-1S-4S
> register 1S-1S-1S
>
> erase commands
> 21 (4.00 KiB) [1]
> dc (64.0 KiB) [3]
> c7 (64.0 MiB)
>
> sector map
> region (in hex) | erase mask | overlaid
> ------------------+------------+---------
> 00000000-03ffffff | [ 3] | no
>
> locked sectors
> region (in hex) | status | #blocks
> ------------------+----------+--------
> 00000000-03ffffff | unlocked | 1024
>
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
> Here are below more examples of output with various situations. The full
> output of the "params" content has been manually removed to only show
> what has been added and how it behaves.
>
> $ flash_lock -l /dev/mtd0 0x3f00000 16
> $ cat /sys/kernel/debug/spi-nor/spi0.0/params
> locked sectors
> region (in hex) | status | #blocks
> ------------------+----------+--------
> 00000000-03efffff | unlocked | 1008
> 03f00000-03ffffff | locked | 16
> $
> $ flash_lock -u /dev/mtd0 0x3f00000 8
> $ cat /sys/kernel/debug/spi-nor/spi0.0/params
> locked sectors
> region (in hex) | status | #blocks
> ------------------+----------+--------
> 00000000-03f7ffff | unlocked | 1016
> 03f80000-03ffffff | locked | 8
> $
> $ flash_lock -u /dev/mtd0
> $ cat /sys/kernel/debug/spi-nor/spi0.0/params
> locked sectors
> region (in hex) | status | #blocks
> ------------------+----------+--------
> 00000000-03ffffff | unlocked | 1024
> $
> $ flash_lock -l /dev/mtd0
> $ cat /sys/kernel/debug/spi-nor/spi0.0/params
> locked sectors
> region (in hex) | status | #blocks
> ------------------+----------+--------
> 00000000-03ffffff | locked | 1024
> $
> $ flash_lock -u /dev/mtd0 0x20000 1022
> $ cat /sys/kernel/debug/spi-nor/spi0.0/params
> locked sectors
> region (in hex) | status | #blocks
> ------------------+----------+--------
> 00000000-0001ffff | locked | 2
> 00020000-03ffffff | unlocked | 1022
> ---
> drivers/mtd/spi-nor/core.h | 5 +++++
> drivers/mtd/spi-nor/debugfs.c | 27 +++++++++++++++++++++++++++
> drivers/mtd/spi-nor/swp.c | 16 ++++++++++++----
> 3 files changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mtd/spi-nor/core.h b/drivers/mtd/spi-nor/core.h
> index 091eb934abe4..8717a0ad90f0 100644
> --- a/drivers/mtd/spi-nor/core.h
> +++ b/drivers/mtd/spi-nor/core.h
> @@ -673,6 +673,7 @@ int spi_nor_post_bfpt_fixups(struct spi_nor *nor,
> const struct sfdp_bfpt *bfpt);
>
> void spi_nor_init_default_locking_ops(struct spi_nor *nor);
> +bool spi_nor_has_default_locking_ops(struct spi_nor *nor);
> void spi_nor_try_unlock_all(struct spi_nor *nor);
> void spi_nor_cache_sr_lock_bits(struct spi_nor *nor, u8 *sr);
> void spi_nor_set_mtd_locking_ops(struct spi_nor *nor);
> @@ -707,6 +708,10 @@ static inline bool spi_nor_needs_sfdp(const struct spi_nor *nor)
> return !nor->info->size;
> }
>
> +u64 spi_nor_get_min_prot_length_sr(struct spi_nor *nor);
> +void spi_nor_get_locked_range_sr(struct spi_nor *nor, const u8 *sr, loff_t *ofs, u64 *len);
> +bool spi_nor_is_locked_sr(struct spi_nor *nor, loff_t ofs, u64 len, const u8 *sr);
> +
> #ifdef CONFIG_DEBUG_FS
> void spi_nor_debugfs_register(struct spi_nor *nor);
> void spi_nor_debugfs_shutdown(void);
> diff --git a/drivers/mtd/spi-nor/debugfs.c b/drivers/mtd/spi-nor/debugfs.c
> index d0191eb9f879..298ce3d9e905 100644
> --- a/drivers/mtd/spi-nor/debugfs.c
> +++ b/drivers/mtd/spi-nor/debugfs.c
> @@ -1,6 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
>
> #include <linux/debugfs.h>
> +#include <linux/math64.h>
> #include <linux/mtd/spi-nor.h>
> #include <linux/spi/spi.h>
> #include <linux/spi/spi-mem.h>
> @@ -77,10 +78,12 @@ static void spi_nor_print_flags(struct seq_file *s, unsigned long flags,
> static int spi_nor_params_show(struct seq_file *s, void *data)
> {
> struct spi_nor *nor = s->private;
> + unsigned int min_prot_len = spi_nor_get_min_prot_length_sr(nor);
> struct spi_nor_flash_parameter *params = nor->params;
> struct spi_nor_erase_map *erase_map = ¶ms->erase_map;
> struct spi_nor_erase_region *region = erase_map->regions;
> const struct flash_info *info = nor->info;
> + loff_t lock_start, lock_length;
> char buf[16], *str;
> unsigned int i;
>
> @@ -159,6 +162,30 @@ static int spi_nor_params_show(struct seq_file *s, void *data)
> region[i].overlaid ? "yes" : "no");
> }
>
> + if (!spi_nor_has_default_locking_ops(nor))
> + return 0;
> +
> + seq_puts(s, "\nlocked sectors\n");
> + seq_puts(s, " region (in hex) | status | #blocks\n");
> + seq_puts(s, " ------------------+----------+--------\n");
> +
> + spi_nor_get_locked_range_sr(nor, nor->dfs_sr_cache, &lock_start, &lock_length);
> + if (!lock_length || lock_length == params->size) {
> + seq_printf(s, " %08llx-%08llx | %s | %llu\n", 0ULL, params->size - 1,
> + lock_length ? " locked" : "unlocked",
> + div_u64(params->size, min_prot_len));
> + } else if (!lock_start) {
> + seq_printf(s, " %08llx-%08llx | %s | %llu\n", 0ULL, lock_length - 1,
> + " locked", div_u64(lock_length, min_prot_len));
> + seq_printf(s, " %08llx-%08llx | %s | %llu\n", lock_length, params->size - 1,
> + "unlocked", div_u64(params->size - lock_length, min_prot_len));
> + } else {
> + seq_printf(s, " %08llx-%08llx | %s | %llu\n", 0ULL, lock_start - 1,
> + "unlocked", div_u64(lock_start, min_prot_len));
> + seq_printf(s, " %08llx-%08llx | %s | %llu\n", lock_start, params->size - 1,
> + " locked", div_u64(lock_length, min_prot_len));
> + }
> +
> return 0;
> }
> DEFINE_SHOW_ATTRIBUTE(spi_nor_params);
> diff --git a/drivers/mtd/spi-nor/swp.c b/drivers/mtd/spi-nor/swp.c
> index 7a6c2b8ef921..d75ed83eb787 100644
> --- a/drivers/mtd/spi-nor/swp.c
> +++ b/drivers/mtd/spi-nor/swp.c
> @@ -32,7 +32,7 @@ static u8 spi_nor_get_sr_tb_mask(struct spi_nor *nor)
> return SR_TB_BIT5;
> }
>
> -static u64 spi_nor_get_min_prot_length_sr(struct spi_nor *nor)
> +u64 spi_nor_get_min_prot_length_sr(struct spi_nor *nor)
> {
> unsigned int bp_slots, bp_slots_needed;
> /*
> @@ -53,8 +53,8 @@ static u64 spi_nor_get_min_prot_length_sr(struct spi_nor *nor)
> return sector_size;
> }
>
> -static void spi_nor_get_locked_range_sr(struct spi_nor *nor, const u8 *sr, loff_t *ofs,
> - u64 *len)
> +void spi_nor_get_locked_range_sr(struct spi_nor *nor, const u8 *sr, loff_t *ofs,
> + u64 *len)
> {
> u64 min_prot_len;
> u8 bp_mask = spi_nor_get_sr_bp_mask(nor);
> @@ -112,7 +112,7 @@ static bool spi_nor_check_lock_status_sr(struct spi_nor *nor, loff_t ofs,
> return (ofs >= lock_offs_max) || (offs_max <= lock_offs);
> }
>
> -static bool spi_nor_is_locked_sr(struct spi_nor *nor, loff_t ofs, u64 len, const u8 *sr)
> +bool spi_nor_is_locked_sr(struct spi_nor *nor, loff_t ofs, u64 len, const u8 *sr)
> {
> return spi_nor_check_lock_status_sr(nor, ofs, len, sr, true);
> }
> @@ -410,6 +410,9 @@ static int spi_nor_sr_is_locked(struct spi_nor *nor, loff_t ofs, u64 len)
> * -is_locked(): Checks if the region is *fully* locked, returns false otherwise.
> * This feeback may be misleading because users may get an "unlocked"
> * status even though a subpart of the region is effectively locked.
> + *
> + * If in doubt during development, check-out the debugfs output which tries to
> + * be more user friendly.
> */
> static const struct spi_nor_locking_ops spi_nor_sr_locking_ops = {
> .lock = spi_nor_sr_lock,
> @@ -422,6 +425,11 @@ void spi_nor_init_default_locking_ops(struct spi_nor *nor)
> nor->params->locking_ops = &spi_nor_sr_locking_ops;
> }
>
> +bool spi_nor_has_default_locking_ops(struct spi_nor *nor)
> +{
> + return nor->params->locking_ops == &spi_nor_sr_locking_ops;
> +}
> +
> static int spi_nor_lock(struct mtd_info *mtd, loff_t ofs, u64 len)
> {
> struct spi_nor *nor = mtd_to_spi_nor(mtd);
>
> --
> 2.53.0
Reviewed-by: Takahiro Kuwano <takahiro.kuwano@infineon.com>
Thanks,
Takahiro
^ permalink raw reply
* Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-04-10 3:41 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, corbet@lwn.net, tony.luck@intel.com,
Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com
Cc: skhan@linuxfoundation.org, x86@kernel.org, hpa@zytor.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, kas@kernel.org, rick.p.edgecombe@intel.com,
akpm@linux-foundation.org, pmladek@suse.com,
rdunlap@infradead.org, dapeng1.mi@linux.intel.com,
kees@kernel.org, elver@google.com, paulmck@kernel.org,
lirongqing@baidu.com, safinaskar@gmail.com, fvdl@google.com,
seanjc@google.com, pawan.kumar.gupta@linux.intel.com,
xin@zytor.com, tiala@microsoft.com, chang.seok.bae@intel.com,
Lendacky, Thomas, elena.reshetova@intel.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-coco@lists.linux.dev, kvm@vger.kernel.org,
eranian@google.com, peternewman@google.com
In-Reply-To: <90f4a692-1c27-4967-bf12-ec3cb597681d@amd.com>
Hi Babu,
On 4/9/26 4:42 PM, Moger, Babu wrote:
> Hi Reinette,
>
> On 4/9/2026 3:50 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 4/9/26 11:05 AM, Moger, Babu wrote:
>>> On 4/9/2026 12:26 PM, Reinette Chatre wrote:
>>>> On 4/9/26 10:19 AM, Moger, Babu wrote:
>>>>> On 4/8/2026 6:41 PM, Reinette Chatre wrote:
>>>>
>>>>>> When the user switches to either "global_assign_ctrl_inherit_mon_per_cpu" or
>>>>>> 'global_assign_ctrl_assign_mon_per_cpu" then "info/kernel_mode_assignment" is created
>>>>>> (or made visible to user space) and is expected to point to default group.
>>>>>> User can change the group using "info/kernel_mode_assignment" at this point.
>>>>>>
>>>>>> If the current scenario is below ...
>>>>>> # cat info/kernel_mode
>>>>>> [global_assign_ctrl_inherit_mon_per_cpu]
>>>>>> inherit_ctrl_and_mon
>>>>>> global_assign_ctrl_assign_mon_per_cpu
>>>>>>
>>>>>> ... then "info/kernel_mode_assignment" will exist but what it should contain if
>>>>>> user switches mode at this point may be up for discussion.
>>>>>>
>>>>>> option 1)
>>>>>> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
>>>>>> the resource group in "info/kernel_mode_assignment" is reset to the
>>>>>> default group and all CPUs PLZA state reset to match. The kernel_mode_cpus
>>>>>> and kernel_mode_cpuslist files become visible in default resource group
>>>>>> and they contain "all online CPUs".
>>>>>>
>>>>>> option 2)
>>>>>> When user switches mode to "global_assign_ctrl_assign_mon_per_cpu" then
>>>>>> the resource group in "info/kernel_mode_assignment" is kept and all
>>>>>> CPUs PLZA state set to match it while also keeping the current
>>>>>> values of that resource group's kernel_mode_cpus and kernel_mode_cpuslist
>>>>>> files.
>>>>>>
>>>>>> I am leaning towards "option 1" to keep it consistent with a switch from
>>>>>> "inherit_ctrl_and_mon" and being deterministic about how a mode is started with
>>>>>
>>>>> Yes. The "option 1" seems appropriate.
>>>>>
>>>>>> a clean slate. What are your thoughts? What would be use case where a user would
>>>>>> want to switch between "global_assign_ctrl_inherit_mon_per_cpu" and
>>>>>> "global_assign_ctrl_assign_mon_per_cpu" to just switch rmid_en on and off?
>>>>>
>>>>>
>>>>> This is a bit tricky.
>>>>>
>>>>> Currently, our requirement is to have a CTRL_MON group for
>>>>> global_assign_ctrl_inherit_mon_per_cpu. In this scenario, we use the
>>>>> group’s CLOSID for PLZA configuration, and RMID is not used (rmid_en
>>>>> = 0) when setting up PLZA.
>>>>>
>>>>> Our requirement is also to have a CTRL_MON/MON group for
>>>>> global_assign_ctrl_assign_mon_per_cpu. In this case as well, the
>>>>> group’s CLOSID and RMID (rmid_en = 1) both are used configure PLZA.
>>>>
>>>> ah, right. Good catch.
>>>>
>>>>>
>>>>> Actually, we should not allow these changes from
>>>>> global_assign_ctrl_inherit_mon_per_cpu to
>>>>> global_assign_ctrl_assign_mon_per_cpu or visa versa.
>>>>
>>>> resctrl could allow it but as part of the switch it resets the "kernel mode group" to
>>>> be the default group every time? This would be the "option 1" above.
>>>
>>> Other options.
>>>
>>> Allow global_assign_ctrl_inherit_mon_per_cpu -> global_assign_ctrl_assign_mon_per_cpu. As part of the switch, reset the "kernel mode group" to the default group.
>>>
>>> Allow global_assign_ctrl_assign_mon_per_cpu -> global_assign_ctrl_inherit_mon_per_cpu. In this case switch
>>> to CTRL_MON/MON -> CTRL_MON.
>>>
>>
>> ok. Could you please return the courtesy of providing feedback on the
>> suggestion you are responding to and also include the motivation why your
>> suggestion is the better option?
>
> Yea. Sure.
>
> We need to allow the switch between the modes. Otherwise only way to reset is to remount the resctrl filesystem. That is not a good option.
>
> Allow global_assign_ctrl_inherit_mon_per_cpu -> global_assign_ctrl_assign_mon_per_cpu. As part of the switch, reset the "kernel mode group" to the default group.
>
> This option is same as you suggested.
>
> Allow global_assign_ctrl_assign_mon_per_cpu -> global_assign_ctrl_inherit_mon_per_cpu. In this case switch
> to CTRL_MON/MON -> CTRL_MON. This option basically disables monitor (rmid_en=0). It is less disruptive. Move is between child group to parent group.
ok. I am concerned that this creates an inconsistent interface. Specifically, sometimes
when switching the mode the kernel group will reset and sometimes it won't. This inconsistency
may be more apparent when writing the user documentation as part of this work. If you are
able to clearly explain how this resctrl fs interface behaves (this cannot be about PLZA
internals as above) then this could work.
Reinette
^ permalink raw reply
* [PATCH v2 3/4] docs/zh_CN: update rust/quick-start.rst translation
From: Ben Guo @ 2026-04-10 2:41 UTC (permalink / raw)
To: Alex Shi, Yanteng Si, Dongliang Mu, Jonathan Corbet
Cc: linux-doc, linux-kernel, rust-for-linux, hust-os-kernel-patches
In-Reply-To: <cover.1775786987.git.ben.guo@openatom.club>
Update the translation of .../rust/quick-start.rst into Chinese.
Update the translation through commit 5935461b4584
("docs: rust: quick-start: add Debian 13 (Trixie)")
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Ben Guo <ben.guo@openatom.club>
---
.../translations/zh_CN/rust/quick-start.rst | 190 ++++++++++++++----
1 file changed, 148 insertions(+), 42 deletions(-)
diff --git a/Documentation/translations/zh_CN/rust/quick-start.rst b/Documentation/translations/zh_CN/rust/quick-start.rst
index 8616556ae4d..5f0ece6411f 100644
--- a/Documentation/translations/zh_CN/rust/quick-start.rst
+++ b/Documentation/translations/zh_CN/rust/quick-start.rst
@@ -13,16 +13,138 @@
本文介绍了如何开始使用Rust进行内核开发。
+安装内核开发所需的 Rust 工具链有几种方式。一种简单的方式是使用 Linux 发行版的软件包
+(如果它们合适的话)——下面的第一节解释了这种方法。这种方法的一个优势是,通常发行版会
+匹配 Rust 和 Clang 所使用的 LLVM。
+
+另一种方式是使用 `kernel.org <https://kernel.org/pub/tools/llvm/rust/>`_ 上提
+供的预构建稳定版本的 LLVM+Rust。这些与 :ref:`获取 LLVM <zh_cn_getting_llvm>` 中的精
+简快速 LLVM 工具链相同,并添加了 Rust for Linux 支持的 Rust 版本。提供了两套工具
+链:"最新 LLVM" 和 "匹配 LLVM"(请参阅链接了解更多信息)。
+
+或者,接下来的两个 "依赖" 章节将解释每个组件以及如何通过 ``rustup``、Rust 的独立
+安装程序或从源码构建来安装它们。
+
+本文档的其余部分解释了有关如何入门的其他方面。
+
+
+发行版
+------
+
+Arch Linux
+**********
+
+Arch Linux 提供较新的 Rust 版本,因此通常开箱即用,例如::
+
+ pacman -S rust rust-src rust-bindgen
+
+
+Debian
+******
+
+Debian 13(Trixie)以及 Testing 和 Debian Unstable(Sid)提供较新的 Rust 版
+本,因此通常开箱即用,例如::
+
+ apt install rustc rust-src bindgen rustfmt rust-clippy
+
+
+Fedora Linux
+************
+
+Fedora Linux 提供较新的 Rust 版本,因此通常开箱即用,例如::
+
+ dnf install rust rust-src bindgen-cli rustfmt clippy
+
+
+Gentoo Linux
+************
+
+Gentoo Linux(尤其是 testing 分支)提供较新的 Rust 版本,因此通常开箱即用,
+例如::
+
+ USE='rust-src rustfmt clippy' emerge dev-lang/rust dev-util/bindgen
+
+可能需要设置 ``LIBCLANG_PATH``。
+
+
+Nix
+***
+
+Nix(unstable 频道)提供较新的 Rust 版本,因此通常开箱即用,例如::
+
+ { pkgs ? import <nixpkgs> {} }:
+ pkgs.mkShell {
+ nativeBuildInputs = with pkgs; [ rustc rust-bindgen rustfmt clippy ];
+ RUST_LIB_SRC = "${pkgs.rust.packages.stable.rustPlatform.rustLibSrc}";
+ }
+
+
+openSUSE
+********
+
+openSUSE Slowroll 和 openSUSE Tumbleweed 提供较新的 Rust 版本,因此通常开箱
+即用,例如::
+
+ zypper install rust rust1.79-src rust-bindgen clang
+
+
+Ubuntu
+******
+
+25.04
+~~~~~
+
+最新的 Ubuntu 版本提供较新的 Rust 版本,因此通常开箱即用,例如::
+
+ apt install rustc rust-src bindgen rustfmt rust-clippy
+
+此外,需要设置 ``RUST_LIB_SRC``,例如::
+
+ RUST_LIB_SRC=/usr/src/rustc-$(rustc --version | cut -d' ' -f2)/library
+
+为方便起见,可以将 ``RUST_LIB_SRC`` 导出到全局环境中。
+
+
+24.04 LTS 及更早版本
+~~~~~~~~~~~~~~~~~~~~
+
+虽然 Ubuntu 24.04 LTS 及更早版本仍然提供较新的 Rust 版本,但它们需要一些额外的配
+置,使用带版本号的软件包,例如::
+
+ apt install rustc-1.80 rust-1.80-src bindgen-0.65 rustfmt-1.80 \
+ rust-1.80-clippy
+ ln -s /usr/lib/rust-1.80/bin/rustfmt /usr/bin/rustfmt-1.80
+ ln -s /usr/lib/rust-1.80/bin/clippy-driver /usr/bin/clippy-driver-1.80
+
+这些软件包都不会将其工具设置为默认值;因此应该显式指定它们,例如::
+
+ make LLVM=1 RUSTC=rustc-1.80 RUSTDOC=rustdoc-1.80 RUSTFMT=rustfmt-1.80 \
+ CLIPPY_DRIVER=clippy-driver-1.80 BINDGEN=bindgen-0.65
+
+或者,修改 ``PATH`` 变量将 Rust 1.80 的二进制文件放在前面,并将 ``bindgen`` 设
+置为默认值,例如::
+
+ PATH=/usr/lib/rust-1.80/bin:$PATH
+ update-alternatives --install /usr/bin/bindgen bindgen \
+ /usr/bin/bindgen-0.65 100
+ update-alternatives --set bindgen /usr/bin/bindgen-0.65
+
+使用带版本号的软件包时需要设置 ``RUST_LIB_SRC``,例如::
+
+ RUST_LIB_SRC=/usr/src/rustc-$(rustc-1.80 --version | cut -d' ' -f2)/library
+
+为方便起见,可以将 ``RUST_LIB_SRC`` 导出到全局环境中。
+
+此外, ``bindgen-0.65`` 在较新的版本(24.04 LTS 和 24.10)中可用,但在更早的版
+本(20.04 LTS 和 22.04 LTS)中可能不可用,因此可能需要手动构建 ``bindgen``
+(请参见下文)。
+
构建依赖
--------
本节描述了如何获取构建所需的工具。
-其中一些依赖也许可以从Linux发行版中获得,包名可能是 ``rustc`` , ``rust-src`` ,
-``rust-bindgen`` 等。然而,在写这篇文章的时候,它们很可能还不够新,除非发行版跟踪最
-新的版本。
-
为了方便检查是否满足要求,可以使用以下目标::
make LLVM=1 rustavailable
@@ -34,15 +156,14 @@
rustc
*****
-需要一个特定版本的Rust编译器。较新的版本可能会也可能不会工作,因为就目前而言,内核依赖
-于一些不稳定的Rust特性。
+需要一个较新版本的Rust编译器。
如果使用的是 ``rustup`` ,请进入内核编译目录(或者用 ``--path=<build-dir>`` 参数
-来 ``设置`` sub-command)并运行::
+来 ``设置`` sub-command),例如运行::
- rustup override set $(scripts/min-tool-version.sh rustc)
+ rustup override set stable
-+这将配置你的工作目录使用正确版本的 ``rustc``,而不影响你的默认工具链。
+这将配置你的工作目录使用给定版本的 ``rustc``,而不影响你的默认工具链。
请注意覆盖应用当前的工作目录(和它的子目录)。
@@ -54,7 +175,7 @@ rustc
Rust标准库源代码
****************
-Rust标准库的源代码是必需的,因为构建系统会交叉编译 ``core`` 和 ``alloc`` 。
+Rust标准库的源代码是必需的,因为构建系统会交叉编译 ``core`` 。
如果正在使用 ``rustup`` ,请运行::
@@ -64,10 +185,10 @@ Rust标准库的源代码是必需的,因为构建系统会交叉编译 ``core
否则,如果使用独立的安装程序,可以将Rust源码树下载到安装工具链的文件夹中::
- curl -L "https://static.rust-lang.org/dist/rust-src-$(scripts/min-tool-version.sh rustc).tar.gz" |
- tar -xzf - -C "$(rustc --print sysroot)/lib" \
- "rust-src-$(scripts/min-tool-version.sh rustc)/rust-src/lib/" \
- --strip-components=3
+ curl -L "https://static.rust-lang.org/dist/rust-src-$(rustc --version | cut -d' ' -f2).tar.gz" |
+ tar -xzf - -C "$(rustc --print sysroot)/lib" \
+ "rust-src-$(rustc --version | cut -d' ' -f2)/rust-src/lib/" \
+ --strip-components=3
在这种情况下,以后升级Rust编译器版本需要手动更新这个源代码树(这可以通过移除
``$(rustc --print sysroot)/lib/rustlib/src/rust`` ,然后重新执行上
@@ -97,24 +218,21 @@ Linux发行版中可能会有合适的包,所以最好先检查一下。
bindgen
*******
-内核的C端绑定是在构建时使用 ``bindgen`` 工具生成的。这需要特定的版本。
-
-通过以下方式安装它(注意,这将从源码下载并构建该工具)::
-
- cargo install --locked --version $(scripts/min-tool-version.sh bindgen) bindgen-cli
+内核的C端绑定是在构建时使用 ``bindgen`` 工具生成的。
-``bindgen`` 需要找到合适的 ``libclang`` 才能工作。如果没有找到(或者找到的
-``libclang`` 与应该使用的 ``libclang`` 不同),则可以使用 ``clang-sys``
-理解的环境变量(Rust绑定创建的 ``bindgen`` 用来访问 ``libclang``):
+例如,通过以下方式安装它(注意,这将从源码下载并构建该工具)::
+ cargo install --locked bindgen-cli
-* ``LLVM_CONFIG_PATH`` 可以指向一个 ``llvm-config`` 可执行文件。
+``bindgen`` 使用 ``clang-sys`` crate 来查找合适的 ``libclang`` (可以静态链
+接、动态链接或在运行时加载)。默认情况下,上面的 ``cargo`` 命令会生成一个在运行时
+加载 ``libclang`` 的 ``bindgen`` 二进制文件。如果没有找到(或者应该使用与找到的
+不同的 ``libclang``),可以调整该过程,例如使用 ``LIBCLANG_PATH`` 环境变量。详
+情请参阅 ``clang-sys`` 的文档:
-* 或者 ``LIBCLANG_PATH`` 可以指向 ``libclang`` 共享库或包含它的目录。
+ https://github.com/KyleMayes/clang-sys#linking
-* 或者 ``CLANG_PATH`` 可以指向 ``clang`` 可执行文件。
-
-详情请参阅 ``clang-sys`` 的文档:
+ https://github.com/KyleMayes/clang-sys#environment-variables
开发依赖
@@ -151,18 +269,6 @@ clippy
独立的安装程序也带有 ``clippy`` 。
-cargo
-*****
-
-``cargo`` 是Rust的本地构建系统。目前需要它来运行测试,因为它被用来构建一个自定义的标准
-库,其中包含了内核中自定义 ``alloc`` 所提供的设施。测试可以使用 ``rusttest`` Make 目标
-来运行。
-
-如果使用的是 ``rustup`` ,所有的配置文件都已经安装了该工具,因此不需要再做什么。
-
-独立的安装程序也带有 ``cargo`` 。
-
-
rustdoc
*******
@@ -223,7 +329,7 @@ Rust支持(CONFIG_RUST)需要在 ``General setup`` 菜单中启用。在其
如果使用的是GDB/Binutils,而Rust符号没有被demangled,原因是工具链还不支持Rust的新v0
mangling方案。有几个办法可以解决:
- - 安装一个较新的版本(GDB >= 10.2, Binutils >= 2.36)。
+- 安装一个较新的版本(GDB >= 10.2, Binutils >= 2.36)。
- - 一些版本的GDB(例如vanilla GDB 10.1)能够使用嵌入在调试信息(``CONFIG_DEBUG_INFO``)
- 中的pre-demangled的名字。
+- 一些版本的GDB(例如vanilla GDB 10.1)能够使用嵌入在调试信息(``CONFIG_DEBUG_INFO``)
+ 中的pre-demangled的名字。
--
2.53.0
^ permalink raw reply related
* [PATCH v2 4/4] docs/zh_CN: update rust/index.rst translation
From: Ben Guo @ 2026-04-10 2:41 UTC (permalink / raw)
To: Alex Shi, Yanteng Si, Dongliang Mu, Jonathan Corbet
Cc: linux-doc, linux-kernel, rust-for-linux, hust-os-kernel-patches
In-Reply-To: <cover.1775786987.git.ben.guo@openatom.club>
Update the translation of .../rust/index.rst into Chinese.
Update the translation through commit a592a36e4937
("Documentation: use a source-read extension for the index link boilerplate")
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Ben Guo <ben.guo@openatom.club>
---
Documentation/translations/zh_CN/rust/index.rst | 17 -----------------
1 file changed, 17 deletions(-)
diff --git a/Documentation/translations/zh_CN/rust/index.rst b/Documentation/translations/zh_CN/rust/index.rst
index 5347d472958..138e057bee4 100644
--- a/Documentation/translations/zh_CN/rust/index.rst
+++ b/Documentation/translations/zh_CN/rust/index.rst
@@ -12,16 +12,6 @@ Rust
与内核中的Rust有关的文档。若要开始在内核中使用Rust,请阅读 quick-start.rst 指南。
-Rust 实验
----------
-Rust 支持在 v6.1 版本中合并到主线,以帮助确定 Rust 作为一种语言是否适合内核,
-即是否值得进行权衡。
-
-目前,Rust 支持主要面向对 Rust 支持感兴趣的内核开发人员和维护者,
-以便他们可以开始处理抽象和驱动程序,并帮助开发基础设施和工具。
-
-如果您是终端用户,请注意,目前没有适合或旨在生产使用的内置驱动程序或模块,
-并且 Rust 支持仍处于开发/实验阶段,尤其是对于特定内核配置。
代码文档
--------
@@ -50,10 +40,3 @@ Rust 支持在 v6.1 版本中合并到主线,以帮助确定 Rust 作为一种
testing
你还可以在 :doc:`../../../process/kernel-docs` 中找到 Rust 的学习材料。
-
-.. only:: subproject and html
-
- Indices
- =======
-
- * :ref:`genindex`
--
2.53.0
^ permalink raw reply related
* [PATCH v2 2/4] docs/zh_CN: update rust/coding-guidelines.rst translation
From: Ben Guo @ 2026-04-10 2:41 UTC (permalink / raw)
To: Alex Shi, Yanteng Si, Dongliang Mu, Jonathan Corbet
Cc: linux-doc, linux-kernel, rust-for-linux, hust-os-kernel-patches
In-Reply-To: <cover.1775786987.git.ben.guo@openatom.club>
Update the translation of .../rust/coding-guidelines.rst into Chinese.
Update the translation through commit 4a9cb2eecc78
("docs: rust: add section on imports formatting")
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Ben Guo <ben.guo@openatom.club>
---
.../zh_CN/rust/coding-guidelines.rst | 262 +++++++++++++++++-
1 file changed, 248 insertions(+), 14 deletions(-)
diff --git a/Documentation/translations/zh_CN/rust/coding-guidelines.rst b/Documentation/translations/zh_CN/rust/coding-guidelines.rst
index 419143b938e..54b902322db 100644
--- a/Documentation/translations/zh_CN/rust/coding-guidelines.rst
+++ b/Documentation/translations/zh_CN/rust/coding-guidelines.rst
@@ -37,6 +37,73 @@
像内核其他部分的 ``clang-format`` 一样, ``rustfmt`` 在单个文件上工作,并且不需要
内核配置。有时,它甚至可以与破碎的代码一起工作。
+导入
+~~~~
+
+``rustfmt`` 默认会以一种在合并和变基时容易产生冲突的方式格式化导入,因为在某些情况下
+它会将多个条目合并到同一行。例如:
+
+.. code-block:: rust
+
+ // Do not use this style.
+ use crate::{
+ example1,
+ example2::{example3, example4, example5},
+ example6, example7,
+ example8::example9,
+ };
+
+相反,内核使用如下所示的垂直布局:
+
+.. code-block:: rust
+
+ use crate::{
+ example1,
+ example2::{
+ example3,
+ example4,
+ example5, //
+ },
+ example6,
+ example7,
+ example8::example9, //
+ };
+
+也就是说,每个条目占一行,只要列表中有多个条目就使用花括号。
+
+末尾的空注释可以保留这种格式。不仅如此, ``rustfmt`` 在添加空注释后实际上会将导入重
+新格式化为垂直布局。也就是说,可以通过对如下输入运行 ``rustfmt`` 来轻松地将原始示例
+重新格式化为预期的风格:
+
+.. code-block:: rust
+
+ // Do not use this style.
+ use crate::{
+ example1,
+ example2::{example3, example4, example5, //
+ },
+ example6, example7,
+ example8::example9, //
+ };
+
+末尾的空注释适用于嵌套导入(如上所示)以及单条目导入——这有助于最小化补丁系列中的差
+异:
+
+.. code-block:: rust
+
+ use crate::{
+ example1, //
+ };
+
+末尾的空注释可以放在花括号内的任何一行中,但建议放在最后一个条目上,因为这让人联想到其
+他格式化工具中的末尾逗号。有时在补丁系列中由于列表的变更,避免多次移动注释可能更简单。
+
+在某些情况下可能需要例外处理,即以上都不是硬性规则。也有一些代码尚未迁移到这种风格,但
+请不要引入其他风格的代码。
+
+最终目标是让 ``rustfmt`` 在稳定版本中自动支持这种格式化风格(或类似的风格),而无需
+末尾的空注释。因此,在某个时候,目标是移除这些注释。
+
注释
----
@@ -77,6 +144,16 @@
// ...
}
+这适用于公共和私有项目。这增加了与公共项目的一致性,允许在更改可见性时减少涉及的更改,
+并允许我们将来也为私有项目生成文档。换句话说,如果为私有项目编写了文档,那么仍然应该使
+用 ``///`` 。例如:
+
+.. code-block:: rust
+
+ /// My private function.
+ // TODO: ...
+ fn f() {}
+
一种特殊的注释是 ``// SAFETY:`` 注释。这些注释必须出现在每个 ``unsafe`` 块之前,它们
解释了为什么该块内的代码是正确/健全的,即为什么它在任何情况下都不会触发未定义行为,例如:
@@ -131,27 +208,27 @@ https://commonmark.org/help/
这个例子展示了一些 ``rustdoc`` 的特性和内核中遵循的一些惯例:
- - 第一段必须是一个简单的句子,简要地描述被记录的项目的作用。进一步的解释必须放在额
- 外的段落中。
+- 第一段必须是一个简单的句子,简要地描述被记录的项目的作用。进一步的解释必须放在额
+ 外的段落中。
- - 不安全的函数必须在 ``# Safety`` 部分记录其安全前提条件。
+- 不安全的函数必须在 ``# Safety`` 部分记录其安全前提条件。
- - 虽然这里没有显示,但如果一个函数可能会恐慌,那么必须在 ``# Panics`` 部分描述发
- 生这种情况的条件。
+- 虽然这里没有显示,但如果一个函数可能会恐慌,那么必须在 ``# Panics`` 部分描述发
+ 生这种情况的条件。
- 请注意,恐慌应该是非常少见的,只有在有充分理由的情况下才会使用。几乎在所有的情况下,
- 都应该使用一个可失败的方法,通常是返回一个 ``Result``。
+ 请注意,恐慌应该是非常少见的,只有在有充分理由的情况下才会使用。几乎在所有的情况下,
+ 都应该使用一个可失败的方法,通常是返回一个 ``Result``。
- - 如果提供使用实例对读者有帮助的话,必须写在一个叫做``# Examples``的部分。
+- 如果提供使用实例对读者有帮助的话,必须写在一个叫做``# Examples``的部分。
- - Rust项目(函数、类型、常量……)必须有适当的链接(``rustdoc`` 会自动创建一个
- 链接)。
+- Rust项目(函数、类型、常量……)必须有适当的链接(``rustdoc`` 会自动创建一个
+ 链接)。
- - 任何 ``unsafe`` 的代码块都必须在前面加上一个 ``// SAFETY:`` 的注释,描述里面
- 的代码为什么是正确的。
+- 任何 ``unsafe`` 的代码块都必须在前面加上一个 ``// SAFETY:`` 的注释,描述里面
+ 的代码为什么是正确的。
- 虽然有时原因可能看起来微不足道,但写这些注释不仅是记录已经考虑到的问题的好方法,
- 最重要的是,它提供了一种知道没有额外隐含约束的方法。
+ 虽然有时原因可能看起来微不足道,但写这些注释不仅是记录已经考虑到的问题的好方法,
+ 最重要的是,它提供了一种知道没有额外隐含约束的方法。
要了解更多关于如何编写Rust和拓展功能的文档,请看看 ``rustdoc`` 这本书,网址是:
@@ -170,6 +247,22 @@ https://commonmark.org/help/
/// [`struct mutex`]: srctree/include/linux/mutex.h
+C FFI 类型
+----------
+
+Rust 内核代码使用类型别名(如 ``c_int``)来引用 C 类型(如 ``int``),这些别名可
+以直接从 ``kernel`` 预导入(prelude)中获取。请不要使用 ``core::ffi`` 中的别
+名——它们可能无法映射到正确的类型。
+
+这些别名通常应该直接通过其标识符引用,即作为单段路径。例如:
+
+.. code-block:: rust
+
+ fn f(p: *const c_char) -> c_int {
+ // ...
+ }
+
+
命名
----
@@ -202,3 +295,144 @@ Rust内核代码遵循通常的Rust命名空间:
也就是说, ``GPIO_LINE_DIRECTION_IN`` 的等价物将被称为 ``gpio::LineDirection::In`` 。
特别是,它不应该被命名为 ``gpio::gpio_line_direction::GPIO_LINE_DIRECTION_IN`` 。
+
+
+代码检查提示(Lints)
+---------------------
+
+在 Rust 中,可以在局部 ``allow`` 特定的警告(诊断信息、代码检查提示(lint)),
+使编译器忽略给定函数、模块、代码块等中给定警告的实例。
+
+这类似于 C 中的 ``#pragma GCC diagnostic push`` + ``ignored`` + ``pop``
+[#]_:
+
+.. code-block:: c
+
+ #pragma GCC diagnostic push
+ #pragma GCC diagnostic ignored "-Wunused-function"
+ static void f(void) {}
+ #pragma GCC diagnostic pop
+
+.. [#] 在这个特定情况下,可以使用内核的 ``__{always,maybe}_unused`` 属性
+ (C23 的 ``[[maybe_unused]]``);然而,此示例旨在反映下文讨论的 Rust 中
+ 的等效代码检查提示。
+
+但要简洁得多:
+
+.. code-block:: rust
+
+ #[allow(dead_code)]
+ fn f() {}
+
+凭借这一点,可以更方便地默认启用更多诊断(即在 ``W=`` 级别之外)。特别是那些可能有
+一些误报但在其他方面非常有用的诊断,保持启用可以捕获潜在的错误。
+
+在此基础上,Rust 提供了 ``expect`` 属性,更进一步。如果警告没有产生,它会让编译器
+发出警告。例如,以下代码将确保当 ``f()`` 在某处被调用时,我们必须移除该属性:
+
+.. code-block:: rust
+
+ #[expect(dead_code)]
+ fn f() {}
+
+如果我们不这样做,编译器会发出警告::
+
+ warning: this lint expectation is unfulfilled
+ --> x.rs:3:10
+ |
+ 3 | #[expect(dead_code)]
+ | ^^^^^^^^^
+ |
+ = note: `#[warn(unfulfilled_lint_expectations)]` on by default
+
+这意味着 ``expect`` 不会在不需要时被遗忘,这可能发生在以下几种情况中:
+
+- 开发过程中添加的临时属性。
+
+- 编译器、Clippy 或自定义工具中代码检查提示的改进可能消除误报。
+
+- 当代码检查提示不再需要时,因为预期它会在某个时候被移除,例如上面的
+ ``dead_code`` 示例。
+
+这也增加了剩余 ``allow`` 的可见性,并减少了误用的可能性。
+
+因此,优先使用 ``expect`` 而不是 ``allow``,除非:
+
+- 条件编译在某些情况下触发警告,在其他情况下不触发。
+
+ 如果与总的相比,只有少数情况触发(或不触发)警告,那么可以考虑使用条件
+ ``expect``(即 ``cfg_attr(..., expect(...))``)。否则,使用 ``allow`` 可
+ 能更简单。
+
+- 在宏内部,不同的调用可能会创建在某些情况下触发警告而在其他情况下不触发的展开代码。
+
+- 当代码可能在某些架构上触发警告但在其他架构上不触发时,例如到 C FFI 类型的 ``as``
+ 转换。
+
+作为一个更详细的示例,考虑以下程序:
+
+.. code-block:: rust
+
+ fn g() {}
+
+ fn main() {
+ #[cfg(CONFIG_X)]
+ g();
+ }
+
+这里,如果 ``CONFIG_X`` 未设置,函数 ``g()`` 是死代码。我们可以在这里使用
+``expect`` 吗?
+
+.. code-block:: rust
+
+ #[expect(dead_code)]
+ fn g() {}
+
+ fn main() {
+ #[cfg(CONFIG_X)]
+ g();
+ }
+
+如果 ``CONFIG_X`` 被设置,这将产生代码检查提示,因为在该配置中它不是死代码。因
+此,在这种情况下,我们不能直接使用 ``expect``。
+
+一个简单的可能性是使用 ``allow``:
+
+.. code-block:: rust
+
+ #[allow(dead_code)]
+ fn g() {}
+
+ fn main() {
+ #[cfg(CONFIG_X)]
+ g();
+ }
+
+另一种方法是使用条件 ``expect``:
+
+.. code-block:: rust
+
+ #[cfg_attr(not(CONFIG_X), expect(dead_code))]
+ fn g() {}
+
+ fn main() {
+ #[cfg(CONFIG_X)]
+ g();
+ }
+
+这将确保如果有人在某处引入了对 ``g()`` 的另一个调用(例如无条件的),那么将会被发现
+它不再是死代码。然而, ``cfg_attr`` 比简单的 ``allow`` 更复杂。
+
+因此,当涉及多个配置或者代码检查提示可能由于非局部更改(如 ``dead_code``)而触发
+时,使用条件 ``expect`` 可能不值得。
+
+有关 Rust 中诊断的更多信息,请参阅:
+
+ https://doc.rust-lang.org/stable/reference/attributes/diagnostics.html
+
+错误处理
+--------
+
+有关 Rust for Linux 特定错误处理的背景和指南,请参阅:
+
+ https://rust.docs.kernel.org/kernel/error/type.Result.html#error-codes-in-c-and-rust
--
2.53.0
^ permalink raw reply related
* [PATCH v2 1/4] docs/zh_CN: update rust/arch-support.rst translation
From: Ben Guo @ 2026-04-10 2:41 UTC (permalink / raw)
To: Alex Shi, Yanteng Si, Dongliang Mu, Jonathan Corbet
Cc: linux-doc, linux-kernel, rust-for-linux, hust-os-kernel-patches
In-Reply-To: <cover.1775786987.git.ben.guo@openatom.club>
Update the translation of .../rust/arch-support.rst into Chinese.
Update the translation through commit ccb8ce526807
("ARM: 9441/1: rust: Enable Rust support for ARMv7")
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Ben Guo <ben.guo@openatom.club>
---
Documentation/translations/zh_CN/rust/arch-support.rst | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/Documentation/translations/zh_CN/rust/arch-support.rst b/Documentation/translations/zh_CN/rust/arch-support.rst
index abd708d48f8..f5ae44588a5 100644
--- a/Documentation/translations/zh_CN/rust/arch-support.rst
+++ b/Documentation/translations/zh_CN/rust/arch-support.rst
@@ -19,9 +19,10 @@
============= ================ ==============================================
架构 支持水平 限制因素
============= ================ ==============================================
-``arm64`` Maintained 只有小端序
+``arm`` Maintained 仅 ARMv7 小端序。
+``arm64`` Maintained 仅小端序。
``loongarch`` Maintained \-
-``riscv`` Maintained 只有 ``riscv64``
-``um`` Maintained 只有 ``x86_64``
-``x86`` Maintained 只有 ``x86_64``
+``riscv`` Maintained 仅 ``riscv64``,且仅限 LLVM/Clang。
+``um`` Maintained \-
+``x86`` Maintained 仅 ``x86_64``。
============= ================ ==============================================
--
2.53.0
^ permalink raw reply related
* [PATCH v2 0/4] docs/zh_CN: update rust/ subsystem translations
From: Ben Guo @ 2026-04-10 2:41 UTC (permalink / raw)
To: Alex Shi, Yanteng Si, Dongliang Mu, Jonathan Corbet
Cc: linux-doc, linux-kernel, rust-for-linux, hust-os-kernel-patches
Update Chinese translations for the Rust subsystem documentation,
syncing with the latest upstream changes.
- arch-support.rst: add ARM (ARMv7) support, update RISC-V and UM notes
- coding-guidelines.rst: add imports formatting, private item docs,
C FFI types, and Lints sections
- quick-start.rst: add distro-specific install instructions, update
rustc/bindgen sections, remove cargo section
- index.rst: remove experimental notice and genindex
Changes in v2:
- Add Reviewed-by from Gary Guo
Ben Guo (4):
docs/zh_CN: update rust/arch-support.rst translation
docs/zh_CN: update rust/coding-guidelines.rst translation
docs/zh_CN: update rust/quick-start.rst translation
docs/zh_CN: update rust/index.rst translation
.../translations/zh_CN/rust/arch-support.rst | 9 +-
.../zh_CN/rust/coding-guidelines.rst | 262 +++++++++++++++++-
.../translations/zh_CN/rust/index.rst | 17 --
.../translations/zh_CN/rust/quick-start.rst | 190 ++++++++++---
4 files changed, 401 insertions(+), 77 deletions(-)
--
2.53.0
^ permalink raw reply
* Re: [PATCH 0/4] docs/zh_CN: update rust/ subsystem translations
From: Dongliang Mu @ 2026-04-10 1:09 UTC (permalink / raw)
To: Ben Guo, Alex Shi, Yanteng Si, Jonathan Corbet
Cc: linux-doc, linux-kernel, rust-for-linux
In-Reply-To: <34e73fc0-97d8-4bba-8083-84b932525789@openatom.club>
On 4/9/26 12:54 AM, Ben Guo wrote:
> On 4/8/26 7:44 PM, Dongliang Mu wrote:
>> Hi Guo,
>>
>> I found an issue in this patchset: please do not directly include my
>> review tag from the internal mailing list [1].
>>
>> After you submit it to the linux‑doc mailing list, I will add my review
>> tag at that time. Including it now would look inappropriate.
>>
>> Our internal review is only intended to maintain patch quality for our
>> open‑source club.
> Hi Dongliang,
>
> Thanks for pointing this out.
>
> I will remove your Reviewed-by from all patches and resend as v2.
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
You can add the review tag from me and Guo in the v2.
>
> Thanks,
> Ben
^ permalink raw reply
* Re: [PATCH] Documentation: Refactored watchdog old doc
From: Guenter Roeck @ 2026-04-10 0:53 UTC (permalink / raw)
To: Sunny Patel, Jonathan Corbet
Cc: Wim Van Sebroeck, Shuah Khan, linux-watchdog, linux-doc,
linux-kernel
In-Reply-To: <20260409175301.22902-1-nueralspacetech@gmail.com>
On 4/9/26 10:53, Sunny Patel wrote:
> Revisited old doc of watchdog and did some cleanup.
> Also added support for new api in doc.
>
> Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
> ---
> Documentation/watchdog/watchdog-api.rst | 49 +++++++++++++++++++++----
> 1 file changed, 41 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/watchdog/watchdog-api.rst b/Documentation/watchdog/watchdog-api.rst
> index 78e228c272cf..446f961852ec 100644
> --- a/Documentation/watchdog/watchdog-api.rst
> +++ b/Documentation/watchdog/watchdog-api.rst
> @@ -2,7 +2,7 @@
> The Linux Watchdog driver API
> =============================
>
> -Last reviewed: 10/05/2007
> +Last reviewed: 04/08/2026
>
>
>
> @@ -106,11 +106,10 @@ the requested one due to limitation of the hardware::
> This example might actually print "The timeout was set to 60 seconds"
> if the device has a granularity of minutes for its timeout.
>
> -Starting with the Linux 2.4.18 kernel, it is possible to query the
> -current timeout using the GETTIMEOUT ioctl::
> +It is also possible to get the current timeout with the GETTIMEOUT ioctl::
>
> ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
> - printf("The timeout was is %d seconds\n", timeout);
> + printf("The timeout is %d seconds\n", timeout);
>
> Pretimeouts
> ===========
> @@ -133,7 +132,7 @@ seconds. Setting a pretimeout to zero disables it.
> There is also a get function for getting the pretimeout::
>
> ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
> - printf("The pretimeout was is %d seconds\n", timeout);
> + printf("The pretimeout is %d seconds\n", timeout);
>
> Not all watchdog drivers will support a pretimeout.
>
> @@ -145,7 +144,7 @@ before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
> that returns the number of seconds before reboot::
>
> ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
> - printf("The timeout was is %d seconds\n", timeleft);
> + printf("The timeout is %d seconds\n", timeleft);
>
> Environmental monitoring
> ========================
> @@ -227,12 +226,33 @@ The watchdog saw a keepalive ping since it was last queried.
> WDIOF_SETTIMEOUT Can set/get the timeout
> ================ =======================
>
> -The watchdog can do pretimeouts.
> +The watchdog supports timeout set/get via the WDIOC_SETTIMEOUT and
> +WDIOC_GETTIMEOUT ioctls.
>
> ================ ================================
> WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
> ================ ================================
>
> +The watchdog supports a pretimeout, a warning interrupt that fires before
> +the actual reboot tiemout. USE WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT
> +to set/get the pretimeout.
> +
> + ================ ================================
> + WDIOF_MAGICCLOSE Supports magic close char
> + ================ ================================
> +
> +The driver supports the Magic Close feature, The watchdog is only disabled
> +if the characted 'V' is written to /dev/watchdog before the file descriptor
> +is closed. Without this, closing the device disables the watchdog
> +unconditionally.
> +
> + ================ ================================
> + WDIOF_ALARMONLY Not a reboot watchdog
> + ================ ================================
> +
> +The watchdog will not reboot the system when it expires. Instead it
> +triggers a management or other external alarm. Userspace should not
> +rely on a system reboot occurring.
>
> For those drivers that return any bits set in the option field, the
> GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
> @@ -268,4 +288,17 @@ The following options are available:
> WDIOS_TEMPPANIC Kernel panic on temperature trip
> ================= ================================
>
> -[FIXME -- better explanations]
> +``WDIOS_DISABLECARD`` stops the watchdog timer. The driver will cease
> +pinging the hardware watchdog, allowing a controlled shutdown without
> +a forced reboot. This is equivalent to the watchdog being disarmed.
> +
> +``WDIOS_ENABLECARD`` starts the watchdog timer. if the watchdog was
> +previously stopped via ``WDIOS_DISABLECARD``,this will re-enable it. The
> +hardware watchdog will begin counting down from the configured timeout.
> +
> +``WDIOS_TEMPPANIC`` enables temperature-based kernel panic. When set,
> +the driver will call ``panic()`` (or ``kernel_power_off()`` on some
> +drivers) if the hardware temperature sensor exceeds its threshold,
> +rather than only setting the ``WDIOF_OVERHEAT`` status bit. Support
> +for this option is driver-specific, not all watchdog drivers implement
> +temperature monitoring.
> \ No newline at end of file
FWIW, I think if we update the document, all functionality not supported
by the watchdog core (specifically but not necessarily limited to
WDIOS_TEMPPANIC) should be declared/marked deprecated.
Guenter
^ permalink raw reply
* Re: [PATCH] Documentation: Refactored watchdog old doc
From: Randy Dunlap @ 2026-04-10 0:31 UTC (permalink / raw)
To: Sunny Patel, Jonathan Corbet
Cc: Wim Van Sebroeck, Guenter Roeck, Shuah Khan, linux-watchdog,
linux-doc, linux-kernel
In-Reply-To: <20260409175301.22902-1-nueralspacetech@gmail.com>
On 4/9/26 10:53 AM, Sunny Patel wrote:
> Revisited old doc of watchdog and did some cleanup.
> Also added support for new api in doc.
>
> Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
> ---
> Documentation/watchdog/watchdog-api.rst | 49 +++++++++++++++++++++----
> 1 file changed, 41 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/watchdog/watchdog-api.rst b/Documentation/watchdog/watchdog-api.rst
> index 78e228c272cf..446f961852ec 100644
> --- a/Documentation/watchdog/watchdog-api.rst
> +++ b/Documentation/watchdog/watchdog-api.rst
> @@ -2,7 +2,7 @@
> The Linux Watchdog driver API
> =============================
>
> -Last reviewed: 10/05/2007
> +Last reviewed: 04/08/2026
>
>
>
> @@ -106,11 +106,10 @@ the requested one due to limitation of the hardware::
> This example might actually print "The timeout was set to 60 seconds"
> if the device has a granularity of minutes for its timeout.
>
> -Starting with the Linux 2.4.18 kernel, it is possible to query the
> -current timeout using the GETTIMEOUT ioctl::
> +It is also possible to get the current timeout with the GETTIMEOUT ioctl::
>
These 3 printf() deletions of /was/ are included in my 5-patch series:
https://lore.kernel.org/linux-watchdog/20260228010402.2389343-1-rdunlap@infradead.org/
> ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
> - printf("The timeout was is %d seconds\n", timeout);
> + printf("The timeout is %d seconds\n", timeout);
>
> Pretimeouts
> ===========
> @@ -133,7 +132,7 @@ seconds. Setting a pretimeout to zero disables it.
> There is also a get function for getting the pretimeout::
>
> ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
> - printf("The pretimeout was is %d seconds\n", timeout);
> + printf("The pretimeout is %d seconds\n", timeout);
>
> Not all watchdog drivers will support a pretimeout.
>
> @@ -145,7 +144,7 @@ before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
> that returns the number of seconds before reboot::
>
> ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
> - printf("The timeout was is %d seconds\n", timeleft);
> + printf("The timeout is %d seconds\n", timeleft);
>
> Environmental monitoring
> ========================
> @@ -227,12 +226,33 @@ The watchdog saw a keepalive ping since it was last queried.
> WDIOF_SETTIMEOUT Can set/get the timeout
> ================ =======================
>
> -The watchdog can do pretimeouts.
> +The watchdog supports timeout set/get via the WDIOC_SETTIMEOUT and
> +WDIOC_GETTIMEOUT ioctls.
>
> ================ ================================
> WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
> ================ ================================
>
> +The watchdog supports a pretimeout, a warning interrupt that fires before
> +the actual reboot tiemout. USE WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT
Use
> +to set/get the pretimeout.
> +
> + ================ ================================
> + WDIOF_MAGICCLOSE Supports magic close char
> + ================ ================================
> +
> +The driver supports the Magic Close feature, The watchdog is only disabled
> +if the characted 'V' is written to /dev/watchdog before the file descriptor
> +is closed. Without this, closing the device disables the watchdog
> +unconditionally.
> +
> + ================ ================================
> + WDIOF_ALARMONLY Not a reboot watchdog
> + ================ ================================
Documentation/watchdog/watchdog-api.rst:250: ERROR: Malformed table.
Text in column margin in table line 2.
================ ================================
WDIOF_ALARMONLY Not a reboot watchdog
================ ================================
Please test your patches.
> +
> +The watchdog will not reboot the system when it expires. Instead it
> +triggers a management or other external alarm. Userspace should not
> +rely on a system reboot occurring.
>
> For those drivers that return any bits set in the option field, the
> GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
> @@ -268,4 +288,17 @@ The following options are available:
> WDIOS_TEMPPANIC Kernel panic on temperature trip
> ================= ================================
>
> -[FIXME -- better explanations]
> +``WDIOS_DISABLECARD`` stops the watchdog timer. The driver will cease
> +pinging the hardware watchdog, allowing a controlled shutdown without
> +a forced reboot. This is equivalent to the watchdog being disarmed.
> +
> +``WDIOS_ENABLECARD`` starts the watchdog timer. if the watchdog was
> +previously stopped via ``WDIOS_DISABLECARD``,this will re-enable it. The
, this
> +hardware watchdog will begin counting down from the configured timeout.
> +
> +``WDIOS_TEMPPANIC`` enables temperature-based kernel panic. When set,
> +the driver will call ``panic()`` (or ``kernel_power_off()`` on some
> +drivers) if the hardware temperature sensor exceeds its threshold,
> +rather than only setting the ``WDIOF_OVERHEAT`` status bit. Support
> +for this option is driver-specific, not all watchdog drivers implement
driver-specific; not all
> +temperature monitoring.
> \ No newline at end of file
warning: ^^^^^^^^^^^^^^^^
--
~Randy
^ permalink raw reply
* maintainer profiles
From: Randy Dunlap @ 2026-04-10 0:18 UTC (permalink / raw)
To: Linux Documentation, Linux Kernel Mailing List
Cc: Jonathan Corbet, Linux Kernel Workflows
Hi,
Is there supposed to be a difference (or distinction) in the contents of
Documentation/process/maintainer-handbooks.rst
and
Documentation/maintainer/maintainer-entry-profile.rst
?
Can they be combined into one location?
--
~Randy
^ permalink raw reply
* Re: [PATCH] docs: escape ** glob pattern in MAINTAINERS descriptions
From: Randy Dunlap @ 2026-04-10 0:08 UTC (permalink / raw)
To: Matteo Croce, Mauro Carvalho Chehab, Jonathan Corbet
Cc: linux-doc, linux-kernel, Matteo Croce
In-Reply-To: <20260409223135.10186-1-technoboy85@gmail.com>
Hi,
On 4/9/26 3:31 PM, Matteo Croce wrote:
> From: Matteo Croce <teknoraver@meta.com>
>
> Escape '**' in the MAINTAINERS descriptions section to prevent
> reStructuredText from interpreting it as bold/strong inline markup,
> which causes a warning when running 'make htmldocs'.
>
> Fixes: 420849332f9f ("get_maintainer: add ** glob pattern support")
> Signed-off-by: Matteo Croce <teknoraver@meta.com>
> ---
> Documentation/sphinx/maintainers_include.py | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/sphinx/maintainers_include.py b/Documentation/sphinx/maintainers_include.py
> index 519ad18685b2..54f34f47c9ee 100755
> --- a/Documentation/sphinx/maintainers_include.py
> +++ b/Documentation/sphinx/maintainers_include.py
> @@ -89,7 +89,8 @@ class MaintainersInclude(Include):
> output = None
> if descriptions:
> # Escape the escapes in preformatted text.
> - output = "| %s" % (line.replace("\\", "\\\\"))
> + output = "| %s" % (line.replace("\\", "\\\\")
> + .replace("**", "\\**"))
> # Look for and record field letter to field name mappings:
> # R: Designated *reviewer*: FullName <address@domain>
> m = re.search(r"\s(\S):\s", line)
It's nice to eliminate one warning from 'make htmldocs', so this is good
in that regard. However, there are still multiple problems (not Warnings)
with '*' characters in the MAINTAINERS file:
1) F: */net/* all files in "any top level directory"/net
In the html output, it shows "/net/" italicized (that's what one * does).
2) F: fs/**/*foo*.c all *foo*.c files in any subdirectory of fs
In the html output, it shows
F: fs/**/foo.c all foo.c files in any subdirectory of fs
with both occurrences of "foo.c" italicized (dropping the '*' characters).
These 2 examples are actively wrong.
I didn't look at any other possible issues.
--
~Randy
^ permalink raw reply
* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Pawan Gupta @ 2026-04-09 23:48 UTC (permalink / raw)
To: Jim Mattson
Cc: Dave Hansen, x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin,
Josh Poimboeuf, David Kaplan, Sean Christopherson,
Borislav Petkov, Dave Hansen, Peter Zijlstra, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, KP Singh, Jiri Olsa,
David S. Miller, David Laight, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, David Ahern, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, John Fastabend, Stanislav Fomichev,
Hao Luo, Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm,
Asit Mallick, Tao Zhang, bpf, netdev, linux-doc, chao.gao
In-Reply-To: <CALMp9eQx3H+n3V3dQh+ZafQZ6uNBjSYk8tZsvG6ffcY43YTrnQ@mail.gmail.com>
On Thu, Apr 09, 2026 at 02:06:36PM -0700, Jim Mattson wrote:
> On Thu, Apr 9, 2026 at 1:36 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 4/7/26 17:47, Jim Mattson wrote:
> > > On Tue, Apr 7, 2026 at 4:41 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > >> On 4/7/26 16:27, Jim Mattson wrote:
> > >>> What is your proposed BHI_DIS_S override mechanism, then?
> > >> Let me make sure I get this right. The desire is to:
> > >>
> > >> 1. Have hypervisors lie to guests about the CPU they are running on (for
> > >> the benefit of large/diverse migration pools)
> > >> 2. Have guests be allowed to boot with BHI_DIS_S for performance
> > >> 3. Have apps in those guests that care about security to opt back in to
> > >> BHI_DIS_S for themselves?
> > > I just want guests on heterogeneous migration pools to properly
> > > protect themselves from native BHI when running on host kernels at
> > > least as far back as Linux v6.6.
> > >
> > > To that end, I would be satisfied with using the longer BHB clearing
> > > sequence when HYPERVISOR is true and BHI_CTRL is false.
> >
> > If the guests can't get mitigation information from model/family because
> > the hypervisor is lying (or may lie), then it's on the hypervisor to
> > figure it out.
> >
> > I'm not sure we want to just assume that all hypervisors are going to
> > lie all the time about this.
>
> Without any information, that is exactly what we must assume. There is
> precedent for this.
>
> In vulnerable_to_its():
>
> /*
> * If a VMM did not expose ITS_NO, assume that a guest could
> * be running on a vulnerable hardware or may migrate to such
> * hardware.
> */
> if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> return true;
>
>
> In cpu_set_bug_bits():
>
> /*
> * Intel parts with eIBRS are vulnerable to BHI attacks. Parts with
> * BHI_NO still need to use the BHI mitigation to prevent Intra-mode
> * attacks. When virtualized, eIBRS could be hidden, assume vulnerable.
> */
> if (!cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
> (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
> boot_cpu_has(X86_FEATURE_HYPERVISOR)))
> setup_force_cpu_bug(X86_BUG_BHI);
>
> ...and...
>
> if (c->x86_vendor == X86_VENDOR_AMD) {
> if (!cpu_has(c, X86_FEATURE_TSA_SQ_NO) ||
> !cpu_has(c, X86_FEATURE_TSA_L1_NO)) {
> if (cpu_matches(cpu_vuln_blacklist, TSA) ||
> /* Enable bug on Zen guests to allow for
> live migration. */
> (cpu_has(c, X86_FEATURE_HYPERVISOR) &&
> cpu_has(c, X86_FEATURE_ZEN)))
> setup_force_cpu_bug(X86_BUG_TSA);
> }
> }
>
>
> In check_null_seg_clears_base():
>
> /*
> * CPUID bit above wasn't set. If this kernel is still running
> * as a HV guest, then the HV has decided not to advertize
> * that CPUID bit for whatever reason. For example, one
> * member of the migration pool might be vulnerable. Which
> * means, the bug is present: set the BUG flag and return.
> */
> if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {
> set_cpu_bug(c, X86_BUG_NULL_SEG);
> return;
> }
>
> The hypervisor could provide more information so that the guest can
> determine when it's safe to use the short sequence, but that's just
> icing on the cake. The default, out-of-the-box configuration must be
> safe.
In the above cases there was no practical way a VMM could have mitigated
the guest. So the only option for the guest was to take a conservative
approach. Secondly, in the BHI case, real world scenarios of migration
between pre and post ADL CPUs were unknown.
Nevertheless, Intel guidance covers this case by having KVM deploy
BHI_DIS_S for the guest using virtual-SPEC_CTRL. I understand that support
is missing currently, I am working on it. Hopefully, I will be able to
share the draft after this series settles down. We can workout the details
there.
In retrospect, it would have been ideal if this discussion had happened at
the time when virtual-SPEC_CTRL series was introduced.
^ permalink raw reply
* [PATCH 6/6] hugetlb: pass hugetlb reservation ranges in base-page indices
From: Jane Chu @ 2026-04-09 23:41 UTC (permalink / raw)
To: akpm, david, muchun.song, osalvador
Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
corbet, skhan, hughd, baolin.wang, peterx, linux-mm, linux-doc,
linux-kernel
In-Reply-To: <20260409234158.837786-1-jane.chu@oracle.com>
hugetlb_reserve_pages() consume indices in hugepage granularity although
some callers naturally compute offsets in PAGE_SIZE units.
Teach the reservation helpers to accept base-page index ranges and
convert to hugepage indices internally before operating on the
reservation map. This keeps the internal representation unchanged while
making the API contract more uniform for callers.
Update hugetlbfs and memfd call sites to pass base-page indices, and
adjust the documentation to describe the new calling convention. Add
alignment warnings in hugetlb_reserve_pages() to catch invalid ranges
early.
No functional changes.
Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
Documentation/mm/hugetlbfs_reserv.rst | 12 +++++------
fs/hugetlbfs/inode.c | 29 ++++++++++++---------------
mm/hugetlb.c | 26 ++++++++++++++++--------
mm/memfd.c | 9 +++++----
4 files changed, 42 insertions(+), 34 deletions(-)
diff --git a/Documentation/mm/hugetlbfs_reserv.rst b/Documentation/mm/hugetlbfs_reserv.rst
index a49115db18c7..60a52b28f0b4 100644
--- a/Documentation/mm/hugetlbfs_reserv.rst
+++ b/Documentation/mm/hugetlbfs_reserv.rst
@@ -112,8 +112,8 @@ flag was specified in either the shmget() or mmap() call. If NORESERVE
was specified, then this routine returns immediately as no reservations
are desired.
-The arguments 'from' and 'to' are huge page indices into the mapping or
-underlying file. For shmget(), 'from' is always 0 and 'to' corresponds to
+The arguments 'from' and 'to' are base page indices into the mapping or
+underlying file. For shmget(), 'from' is always 0 and 'to' corresponds to
the length of the segment/mapping. For mmap(), the offset argument could
be used to specify the offset into the underlying file. In such a case,
the 'from' and 'to' arguments have been adjusted by this offset.
@@ -136,10 +136,10 @@ to indicate this VMA owns the reservations.
The reservation map is consulted to determine how many huge page reservations
are needed for the current mapping/segment. For private mappings, this is
-always the value (to - from). However, for shared mappings it is possible that
-some reservations may already exist within the range (to - from). See the
-section :ref:`Reservation Map Modifications <resv_map_modifications>`
-for details on how this is accomplished.
+always the number of huge pages covered by the range [from, to). However,
+for shared mappings it is possible that some reservations may already exist
+within the range [from, to). See the section :ref:`Reservation Map Modifications
+<resv_map_modifications>` for details on how this is accomplished.
The mapping may be associated with a subpool. If so, the subpool is consulted
to ensure there is sufficient space for the mapping. It is possible that the
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a72d46ff7980..ec05ed30b70f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -157,10 +157,8 @@ static int hugetlbfs_file_mmap_prepare(struct vm_area_desc *desc)
if (inode->i_flags & S_PRIVATE)
vma_flags_set(&vma_flags, VMA_NORESERVE_BIT);
- if (hugetlb_reserve_pages(inode,
- desc->pgoff >> huge_page_order(h),
- len >> huge_page_shift(h), desc,
- vma_flags) < 0)
+ if (hugetlb_reserve_pages(inode, desc->pgoff, len >> PAGE_SHIFT, desc,
+ vma_flags) < 0)
goto out;
ret = 0;
@@ -408,8 +406,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h,
unsigned long v_end;
pgoff_t start, end;
- start = index * pages_per_huge_page(h);
- end = (index + 1) * pages_per_huge_page(h);
+ start = index;
+ end = start + pages_per_huge_page(h);
i_mmap_lock_write(mapping);
retry:
@@ -518,6 +516,8 @@ static void remove_inode_single_folio(struct hstate *h, struct inode *inode,
struct address_space *mapping, struct folio *folio,
pgoff_t index, bool truncate_op)
{
+ pgoff_t next_index;
+
/*
* If folio is mapped, it was faulted in after being
* unmapped in caller or hugetlb_vmdelete_list() skips
@@ -540,8 +540,9 @@ static void remove_inode_single_folio(struct hstate *h, struct inode *inode,
VM_BUG_ON_FOLIO(folio_test_hugetlb_restore_reserve(folio), folio);
hugetlb_delete_from_page_cache(folio);
if (!truncate_op) {
+ next_index = index + pages_per_huge_page(h);
if (unlikely(hugetlb_unreserve_pages(inode, index,
- index + 1, 1)))
+ next_index, 1)))
hugetlb_fix_reserve_counts(inode);
}
@@ -575,7 +576,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
struct address_space *mapping = &inode->i_data;
const pgoff_t end = lend >> PAGE_SHIFT;
struct folio_batch fbatch;
- pgoff_t next, idx;
+ pgoff_t next;
int i, freed = 0;
bool truncate_op = (lend == LLONG_MAX);
@@ -592,9 +593,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
/*
* Remove folio that was part of folio_batch.
*/
- idx = folio->index >> huge_page_order(h);
remove_inode_single_folio(h, inode, mapping, folio,
- idx, truncate_op);
+ folio->index, truncate_op);
freed++;
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
@@ -604,9 +604,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
}
if (truncate_op)
- (void)hugetlb_unreserve_pages(inode,
- lstart >> huge_page_shift(h),
- LONG_MAX, freed);
+ (void)hugetlb_unreserve_pages(inode, lstart >> PAGE_SHIFT,
+ LONG_MAX, freed);
}
static void hugetlbfs_evict_inode(struct inode *inode)
@@ -1561,9 +1560,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
inode->i_size = size;
clear_nlink(inode);
- if (hugetlb_reserve_pages(inode, 0,
- size >> huge_page_shift(hstate_inode(inode)), NULL,
- acctflag) < 0)
+ if (hugetlb_reserve_pages(inode, 0, size >> PAGE_SHIFT, NULL, acctflag) < 0)
file = ERR_PTR(-ENOMEM);
else
file = alloc_file_pseudo(inode, mnt, name, O_RDWR,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 47ef41b6fb2e..eb4ab5bd0c9f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6532,10 +6532,11 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
}
/*
- * Update the reservation map for the range [from, to].
+ * Update the reservation map for the range [from, to) where 'from' and 'to'
+ * are base-page indices that are expected to be huge page aligned.
*
- * Returns the number of entries that would be added to the reservation map
- * associated with the range [from, to]. This number is greater or equal to
+ * Returns the number of huge pages that would be added to the reservation map
+ * associated with the range [from, to). This number is greater or equal to
* zero. -EINVAL or -ENOMEM is returned in case of any errors.
*/
@@ -6550,6 +6551,7 @@ long hugetlb_reserve_pages(struct inode *inode,
struct resv_map *resv_map;
struct hugetlb_cgroup *h_cg = NULL;
long gbl_reserve, regions_needed = 0;
+ long from_idx, to_idx;
int err;
/* This should never happen */
@@ -6558,6 +6560,12 @@ long hugetlb_reserve_pages(struct inode *inode,
return -EINVAL;
}
+ VM_WARN_ON(!IS_ALIGNED(from, 1UL << huge_page_order(h)));
+ VM_WARN_ON(!IS_ALIGNED(to, 1UL << huge_page_order(h)));
+
+ from_idx = from >> huge_page_order(h);
+ to_idx = to >> huge_page_order(h);
+
/*
* Only apply hugepage reservation if asked. At fault time, an
* attempt will be made for VM_NORESERVE to allocate a page
@@ -6580,7 +6588,7 @@ long hugetlb_reserve_pages(struct inode *inode,
*/
resv_map = inode_resv_map(inode);
- chg = region_chg(resv_map, from, to, ®ions_needed);
+ chg = region_chg(resv_map, from_idx, to_idx, ®ions_needed);
} else {
/* Private mapping. */
resv_map = resv_map_alloc();
@@ -6589,7 +6597,7 @@ long hugetlb_reserve_pages(struct inode *inode,
goto out_err;
}
- chg = to - from;
+ chg = to_idx - from_idx;
set_vma_desc_resv_map(desc, resv_map);
set_vma_desc_resv_flags(desc, HPAGE_RESV_OWNER);
@@ -6644,7 +6652,7 @@ long hugetlb_reserve_pages(struct inode *inode,
* else has to be done for private mappings here
*/
if (!desc || vma_desc_test(desc, VMA_MAYSHARE_BIT)) {
- add = region_add(resv_map, from, to, regions_needed, h, h_cg);
+ add = region_add(resv_map, from_idx, to_idx, regions_needed, h, h_cg);
if (unlikely(add < 0)) {
hugetlb_acct_memory(h, -gbl_reserve);
@@ -6712,7 +6720,7 @@ long hugetlb_reserve_pages(struct inode *inode,
* region_add failed or didn't run.
*/
if (chg >= 0 && add < 0)
- region_abort(resv_map, from, to, regions_needed);
+ region_abort(resv_map, from_idx, to_idx, regions_needed);
if (desc && is_vma_desc_resv_set(desc, HPAGE_RESV_OWNER)) {
kref_put(&resv_map->refs, resv_map_release);
set_vma_desc_resv_map(desc, NULL);
@@ -6728,13 +6736,15 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
long chg = 0;
struct hugepage_subpool *spool = subpool_inode(inode);
long gbl_reserve;
+ long start_idx = start >> huge_page_order(h);
+ long end_idx = end >> huge_page_order(h);
/*
* Since this routine can be called in the evict inode path for all
* hugetlbfs inodes, resv_map could be NULL.
*/
if (resv_map) {
- chg = region_del(resv_map, start, end);
+ chg = region_del(resv_map, start_idx, end_idx);
/*
* region_del() can fail in the rare case where a region
* must be split and another region descriptor can not be
diff --git a/mm/memfd.c b/mm/memfd.c
index 56c8833c4195..59c174c7533c 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -80,14 +80,15 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
struct inode *inode = file_inode(memfd);
struct hstate *h = hstate_file(memfd);
long nr_resv;
- pgoff_t idx;
+ pgoff_t next_index;
int err = -ENOMEM;
gfp_mask = htlb_alloc_mask(h);
gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
- idx = index >> huge_page_order(h);
+ next_index = index + pages_per_huge_page(h);
- nr_resv = hugetlb_reserve_pages(inode, idx, idx + 1, NULL, EMPTY_VMA_FLAGS);
+ nr_resv = hugetlb_reserve_pages(inode, index, next_index, NULL,
+ EMPTY_VMA_FLAGS);
if (nr_resv < 0)
return ERR_PTR(nr_resv);
@@ -137,7 +138,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
}
err_unresv:
if (nr_resv > 0)
- hugetlb_unreserve_pages(inode, idx, idx + 1, 0);
+ hugetlb_unreserve_pages(inode, index, next_index, 0);
return ERR_PTR(err);
}
#endif
--
2.43.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox