Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH v16 00/10] arm64/riscv: Add support for crashkernel CMA reservation
From: Jinjie Ruan @ 2026-06-18  1:45 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, pasha.tatashin,
	pratyush, ruirui.yang, rdunlap, peterz, feng.tang, dapeng1.mi,
	kees, elver, kuba, lirongqing, ebiggers, paulmck, leitao, coxu,
	Liam.Howlett, ryan.roberts, osandov, jbohac, cfsworks,
	tangyouling, sourabhjain, ritesh.list, adityag, liaoyuanhong,
	seanjc, fuqiang.wang, ardb, chenjiahao16, guoren, x86, linux-doc,
	linux-kernel, linux-arm-kernel, loongarch, linuxppc-dev,
	linux-riscv, devicetree, kexec
In-Reply-To: <ajLr53EK6mJbng-7@kernel.org>



On 6/18/2026 2:48 AM, Mike Rapoport wrote:
> Hi Jinjie,
> 
> On Mon, Jun 08, 2026 at 03:34:49PM +0800, Jinjie Ruan wrote:
>> The crash memory allocation, and the exclude of crashk_res, crashk_low_res
>> and crashk_cma memory are almost identical across different architectures,
>> This patch set handle them in crash core in a general way, which eliminate
>> a lot of duplication code.
>>
>> And add support for crashkernel CMA reservation for arm64 and riscv.
>>
>> This patch set is rebased on v7.1-rc1.
> 
> Please rebase this set on v7.2-rc1 once that's out.
> 
> I'm going to queue it in the liveupdate tree then to expose to the wider
> testing.
> 
> Meanwhile it would be great to chase riscv and x86 maintainers for acks :)

Thanks! That sounds great.

I will rebase this patch set on v7.2-rc1 as soon as it is out and send v17.

In the meantime, I will CC and reach out to the RISC-V and x86
maintainers to request their reviews and Acks.

Best regards,
Jinjie

> 


^ permalink raw reply

* Re: [PATCH] kselftest docs: remove reference to obsolete/archived wiki
From: Shuah Khan @ 2026-06-18  1:05 UTC (permalink / raw)
  To: Rafael Passos, shuah, corbet; +Cc: linux-kselftest, linux-doc, Shuah Khan
In-Reply-To: <865def83-a07e-4eba-b795-7da66e0e2d69@linuxfoundation.org>

On 6/17/26 19:03, Shuah Khan wrote:
> On 6/17/26 17:57, Rafael Passos wrote:
>> This link in the docs point to a wiki that is no longer active.
>>
>> The wiki was moved to archive.kernel.org, and there is a warning:
>> "OBSOLETE CONTENT This wiki has been archived and the content is
>> no longer updated."
>>
>> Signed-off-by: Rafael Passos <rafael@rcpassos.me>
>> ---
>>
>>   Documentation/dev-tools/kselftest.rst | 5 -----
>>   1 file changed, 5 deletions(-)
>>
>> diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
>> index d7bfe320338c..64c0ec7428a2 100644
>> --- a/Documentation/dev-tools/kselftest.rst
>> +++ b/Documentation/dev-tools/kselftest.rst
>> @@ -15,11 +15,6 @@ able to run that test on an older kernel. Hence, it is important to keep
>>   code that can still test an older kernel and make sure it skips the test
>>   gracefully on newer releases.
>> -You can find additional information on Kselftest framework, how to
>> -write new tests using the framework on Kselftest wiki:
>> -
>> -https://kselftest.wiki.kernel.org/
>> -
>>   On some systems, hot-plug tests could hang forever waiting for cpu and
>>   memory to be ready to be offlined. A special hot-plug target is created
>>   to run the full range of hot-plug tests. In default mode, hot-plug tests run
> 
> 
> Looks good to me.
> 
> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>

Jon,

I can take this through kselftest tree as I usually do.

thanks,
-- Shuah

^ permalink raw reply

* Re: [PATCH] kselftest docs: remove reference to obsolete/archived wiki
From: Shuah Khan @ 2026-06-18  1:03 UTC (permalink / raw)
  To: Rafael Passos, shuah, corbet; +Cc: linux-kselftest, linux-doc, Shuah Khan
In-Reply-To: <20260617235740.74029-1-rafael@rcpassos.me>

On 6/17/26 17:57, Rafael Passos wrote:
> This link in the docs point to a wiki that is no longer active.
> 
> The wiki was moved to archive.kernel.org, and there is a warning:
> "OBSOLETE CONTENT This wiki has been archived and the content is
> no longer updated."
> 
> Signed-off-by: Rafael Passos <rafael@rcpassos.me>
> ---
> 
>   Documentation/dev-tools/kselftest.rst | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
> index d7bfe320338c..64c0ec7428a2 100644
> --- a/Documentation/dev-tools/kselftest.rst
> +++ b/Documentation/dev-tools/kselftest.rst
> @@ -15,11 +15,6 @@ able to run that test on an older kernel. Hence, it is important to keep
>   code that can still test an older kernel and make sure it skips the test
>   gracefully on newer releases.
>   
> -You can find additional information on Kselftest framework, how to
> -write new tests using the framework on Kselftest wiki:
> -
> -https://kselftest.wiki.kernel.org/
> -
>   On some systems, hot-plug tests could hang forever waiting for cpu and
>   memory to be ready to be offlined. A special hot-plug target is created
>   to run the full range of hot-plug tests. In default mode, hot-plug tests run


Looks good to me.

Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah

^ permalink raw reply

* Re: [PATCH v16 05/10] x86: kexec_file: Use crash_prepare_headers() helper to simplify code
From: Borislav Petkov @ 2026-06-18  0:41 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, peterz, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	leitao, coxu, Liam.Howlett, ryan.roberts, osandov, jbohac,
	cfsworks, tangyouling, sourabhjain, ritesh.list, adityag,
	liaoyuanhong, seanjc, fuqiang.wang, ardb, chenjiahao16, guoren,
	x86, linux-doc, linux-kernel, linux-arm-kernel, loongarch,
	linuxppc-dev, linux-riscv, devicetree, kexec
In-Reply-To: <20260608073459.3119290-6-ruanjinjie@huawei.com>

On Mon, Jun 08, 2026 at 03:34:54PM +0800, Jinjie Ruan wrote:

> Subject: Re: [PATCH v16 05/10] x86: kexec_file: Use crash_prepare_headers() helper to simplify code

Use proper subject prefix: "x86/crash: ..."

> Use the newly introduced crash_prepare_headers() function to replace
> the existing prepare_elf_headers(), allocate cmem and exclude crash kernel
> memory in the crash core, which reduce code duplication.
> 
> Only the following three architecture functions need to be implemented:
> - arch_get_system_nr_ranges(). Call get_nr_ram_ranges_callback()
>   to pre-count the max number of memory ranges.
> 
> - arch_crash_populate_cmem(). Use prepare_elf64_ram_headers_callback()
>   to collect the memory ranges and fills them into cmem.
> 
> - arch_crash_exclude_ranges(). Exclude the low 1M for x86.
> 
> By the way, remove the unused "nr_mem_ranges" in

s/By the way/While at it/

> arch_crash_handle_hotplug_event().
> 
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Acked-by: Baoquan He <bhe@redhat.com>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/x86/kernel/crash.c | 89 +++++------------------------------------
>  1 file changed, 11 insertions(+), 78 deletions(-)

With those nitpicks above addressed:

Acked-by: Borislav Petkov (AMD) <bp@alien8.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* [PATCH] kselftest docs: remove reference to obsolete/archived wiki
From: Rafael Passos @ 2026-06-17 23:57 UTC (permalink / raw)
  To: shuah, corbet; +Cc: Rafael Passos, skhan, linux-kselftest, linux-doc

This link in the docs point to a wiki that is no longer active.

The wiki was moved to archive.kernel.org, and there is a warning:
"OBSOLETE CONTENT This wiki has been archived and the content is
no longer updated."

Signed-off-by: Rafael Passos <rafael@rcpassos.me>
---

 Documentation/dev-tools/kselftest.rst | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index d7bfe320338c..64c0ec7428a2 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -15,11 +15,6 @@ able to run that test on an older kernel. Hence, it is important to keep
 code that can still test an older kernel and make sure it skips the test
 gracefully on newer releases.
 
-You can find additional information on Kselftest framework, how to
-write new tests using the framework on Kselftest wiki:
-
-https://kselftest.wiki.kernel.org/
-
 On some systems, hot-plug tests could hang forever waiting for cpu and
 memory to be ready to be offlined. A special hot-plug target is created
 to run the full range of hot-plug tests. In default mode, hot-plug tests run
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
From: Moger, Babu @ 2026-06-17 23:15 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman
In-Reply-To: <6273f424-9701-4731-9568-10b3eef8b5fd@intel.com>

Hi Reinette,

On 6/16/2026 6:33 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/30/26 4:24 PM, Babu Moger wrote:
>> AMD Privilege Level Zero Association (PLZA) exposes kernel CLOSID/RMID
>> association through MSR_IA32_PQR_PLZA_ASSOC.  Generic resctrl already
>> tracks supported and effective kernel-mode policy in struct
>> resctrl_kmode_cfg, but the architecture layer needs a callable entry point
>> that can push those values into per-CPU hardware on a chosen CPU mask.
>>
>> Declare resctrl_arch_configure_kmode() in linux/resctrl.h with kernel-doc.
>> Implement it on x86: add an SMP callback that writes
>> MSR_IA32_PQR_PLZA_ASSOC on each targeted CPU, and use on_each_cpu_mask()
>> for the broadcast.
> 
> Above is clear from the patch. Please start with focus on why this patch is
> needed.
> 
>>
>> The hook is unused in this patch; later patches in the series wire it into
> 
> Similar to previous work: write changelog in imperative tone and do not
> refer to patches in series but instead let each patch stand on its own.

Will rewrite the changelog.

> 
>> generic resctrl when an effective kernel-mode policy is selected or a CPU
>> mask changes.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> 
>> ---
>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 35 +++++++++++++++++++++++
>>   include/linux/resctrl.h                   | 10 +++++++
>>   2 files changed, 45 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index b20e705606b8..68f1cf503904 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -131,3 +131,38 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
>>   
>>   	return 0;
>>   }
>> +
>> +/*
>> + * SMP call-function callback: each CPU writes its own MSR_IA32_PQR_PLZA_ASSOC
>> + * (AMD PLZA).  Invoked via on_each_cpu_mask() with wait=1 so the on-stack
>> + * union pointed at by @arg is safe.
>> + */
>> +static void resctrl_kmode_set_one_amd(void *arg)
>> +{
>> +	union msr_pqr_plza_assoc *plza = arg;
>> +
>> +	wrmsrl(MSR_IA32_PQR_PLZA_ASSOC, plza->full);
> 
> fyi ...
> commit 2232959db26d ("x86/msr: Switch wrmsrl() users to wrmsrq()")
> commit b5884070f9da ("x86/msr: Remove wrmsrl()")
> 

Yes. Saw that. Will change it to wrmsrq.

>> +}
>> +
>> +/**
>> + * resctrl_arch_configure_kmode() - x86/AMD: program PLZA MSR on a CPU subset
>> + * @cpu_mask:	CPUs to receive the update (see on_each_cpu_mask() for online subset).
> 
> Why is the caveat added? Will resctrl ever provide offline CPUs in the mask?

No. Offline CPUs will not be provided. I am not sure why I added that 
caveat. Probably came from AI review. Will remove.

> 
>> + * @closid:	CLOSID field written into the MSR with CLOSID_EN set.
>> + * @rmid:	RMID field written into the MSR with RMID_EN set.
>> + * @enable:	Value for the PLZA_EN split field.
> 
> Please describe the meaning of the fields instead the mechanics of the code
> that are obvious.

ok.

> 
>> + *
>> + * Context: Do not call with IRQs off or from IRQ context except as allowed for
>> + * on_each_cpu_mask(); see kernel/smp.c.
> 
> Why is this context caveat needed?

Again, Probably came from AI review. Does not look relevant. Will remove.

> 
>> + */
>> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
> 
> Please replace "cpumask_var_t cpu_mask" with "const struct cpumask *cpu_mask".

Sure.

> 
>> +{
>> +	union msr_pqr_plza_assoc plza = { 0 };
>> +
>> +	plza.split.rmid = rmid;
>> +	plza.split.rmid_en = 1;
>> +	plza.split.closid = closid;
>> +	plza.split.closid_en = 1;
>> +	plza.split.plza_en = enable;
>> +
>> +	on_each_cpu_mask(cpu_mask, resctrl_kmode_set_one_amd, &plza, 1);
>> +}
> 
> function self has been discussed already
> 
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index ce28418df00f..570918e57e24 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -712,6 +712,16 @@ bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r);
>>    */
>>   void resctrl_arch_get_kmode_support(struct resctrl_kmode_cfg *kcfg);
>>   
>> +/**
>> + * resctrl_arch_configure_kmode() - Program MSR_IA32_PQR_PLZA_ASSOC on CPUs in @cpu_mask
>> + * @cpu_mask:	Target CPUs; on_each_cpu_mask() applies the callback on the online subset.
>> + * @closid:	CLOSID written to the MSR with CLOSID_EN set.
>> + * @rmid:	RMID written to the MSR with RMID_EN set.
>> + * @enable:	PLZA_EN field value for this update.
> 
> This is a resctrl fs API - please replace all the AMD architecture specific implementation details
> with what the parameters actually mean/represent.

Sure. Will rewrite it.

Thanks

Babu

^ permalink raw reply

* Re: [PATCH v5 3/6] alloc_tag: add size-based filtering to ioctl
From: Suren Baghdasaryan @ 2026-06-17 23:01 UTC (permalink / raw)
  To: Abhishek Bapat
  Cc: Andrew Morton, Kent Overstreet, Hao Ge, Shuah Khan,
	Jonathan Corbet, linux-doc, linux-kernel, linux-mm, Sourav Panda
In-Reply-To: <CAL41Mv6FZU5As+yKiM52axUMsR_FDrYQCK5STjUp-aG+xMD-EQ@mail.gmail.com>

On Wed, Jun 17, 2026 at 3:41 PM Abhishek Bapat <abhishekbapat@google.com> wrote:
>
> On Wed, Jun 17, 2026 at 3:35 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Wed, Jun 17, 2026 at 1:55 PM Abhishek Bapat <abhishekbapat@google.com> wrote:
> > >
> > > On Wed, Jun 17, 2026 at 9:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Mon, Jun 15, 2026 at 4:04 PM Abhishek Bapat <abhishekbapat@google.com> wrote:
> > > > >
> > > > > Extend the allocinfo filtering mechanism to allow users to filter tags
> > > > > based on the total number of bytes allocated [min_size, max_size]. The
> > > > > size range is inclusive.
> > > > >
> > > > > Filtering by size involves retrieving allocinfo per-CPU counters, which
> > > > > is an expensive operation. Hence, the performance of size-based
> > > > > filtering will be worse than other filters.
> > > > >
> > > > > Signed-off-by: Abhishek Bapat <abhishekbapat@google.com>
> > > > > Acked-by: Hao Ge <hao.ge@linux.dev>
> > > > > ---
> > > > >  include/uapi/linux/alloc_tag.h |  8 ++++-
> > > > >  lib/alloc_tag.c                | 63 ++++++++++++++++++++++++++++------
> > > > >  2 files changed, 59 insertions(+), 12 deletions(-)
> > > > >
> > > > > diff --git a/include/uapi/linux/alloc_tag.h b/include/uapi/linux/alloc_tag.h
> > > > > index 3b11877955b9..7f5acbb44c14 100644
> > > > > --- a/include/uapi/linux/alloc_tag.h
> > > > > +++ b/include/uapi/linux/alloc_tag.h
> > > > > @@ -45,13 +45,17 @@ enum {
> > > > >         ALLOCINFO_FILTER_FUNCTION,
> > > > >         ALLOCINFO_FILTER_FILENAME,
> > > > >         ALLOCINFO_FILTER_LINENO,
> > > > > -       __ALLOCINFO_FILTER_LAST = ALLOCINFO_FILTER_LINENO
> > > > > +       ALLOCINFO_FILTER_MIN_SIZE,
> > > > > +       ALLOCINFO_FILTER_MAX_SIZE,
> > > > > +       __ALLOCINFO_FILTER_LAST = ALLOCINFO_FILTER_MAX_SIZE
> > > > >  };
> > > > >
> > > > >  #define ALLOCINFO_FILTER_MASK_MODNAME          (1 << ALLOCINFO_FILTER_MODNAME)
> > > > >  #define ALLOCINFO_FILTER_MASK_FUNCTION         (1 << ALLOCINFO_FILTER_FUNCTION)
> > > > >  #define ALLOCINFO_FILTER_MASK_FILENAME         (1 << ALLOCINFO_FILTER_FILENAME)
> > > > >  #define ALLOCINFO_FILTER_MASK_LINENO           (1 << ALLOCINFO_FILTER_LINENO)
> > > > > +#define ALLOCINFO_FILTER_MASK_MIN_SIZE         (1 << ALLOCINFO_FILTER_MIN_SIZE)
> > > > > +#define ALLOCINFO_FILTER_MASK_MAX_SIZE         (1 << ALLOCINFO_FILTER_MAX_SIZE)
> > > > >
> > > > >  #define ALLOCINFO_FILTER_MASKS \
> > > > >         ((1 << (__ALLOCINFO_FILTER_LAST + 1)) - 1)
> > > > > @@ -59,6 +63,8 @@ enum {
> > > > >  struct allocinfo_filter {
> > > > >         __u64 mask; /* bitmask of the filter fields used */
> > > > >         struct allocinfo_tag fields;
> > > > > +       __u64 min_size;
> > > > > +       __u64 max_size;
> > > > >  };
> > > > >
> > > > >  struct allocinfo_get_at {
> > > > > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > > > > index 5feb61d9fb92..b3d21834b61e 100644
> > > > > --- a/lib/alloc_tag.c
> > > > > +++ b/lib/alloc_tag.c
> > > > > @@ -195,15 +195,26 @@ static int allocinfo_cmp_str(const char *str, const char *template)
> > > > >         return strncmp(allocinfo_str(str), template, ALLOCINFO_STR_SIZE);
> > > > >  }
> > > > >
> > > > > +/* Fetch the per-CPU counters */
> > > > > +static inline struct alloc_tag_counters allocinfo_prefetch_counters(struct codetag *ct)
> > > > > +{
> > > > > +       return alloc_tag_read(ct_to_alloc_tag(ct));
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * Populates the UAPI allocinfo_tag_data structure with active runtime
> > > > >   * profiling counters extracted from the given kernel codetag.
> > > > >   */
> > > > >  static void allocinfo_to_params(struct codetag *ct,
> > > > > -                               struct allocinfo_tag_data *data)
> > > > > +                               struct allocinfo_tag_data *data,
> > > > > +                               struct alloc_tag_counters *counters)
> > > > >  {
> > > > > -       struct alloc_tag *tag = ct_to_alloc_tag(ct);
> > > > > -       struct alloc_tag_counters counter = alloc_tag_read(tag);
> > > > > +       struct alloc_tag_counters local_counters;
> > > > > +
> > > > > +       if (!counters) {
> > > > > +               local_counters = allocinfo_prefetch_counters(ct);
> > > > > +               counters = &local_counters;
> > > > > +       }
> > > > >
> > > > >         if (ct->modname)
> > > > >                 allocinfo_copy_str(data->tag.modname, ct->modname);
> > > > > @@ -212,9 +223,9 @@ static void allocinfo_to_params(struct codetag *ct,
> > > > >         allocinfo_copy_str(data->tag.function, ct->function);
> > > > >         allocinfo_copy_str(data->tag.filename, ct->filename);
> > > > >         data->tag.lineno = ct->lineno;
> > > > > -       data->counter.bytes = counter.bytes;
> > > > > -       data->counter.calls = counter.calls;
> > > > > -       data->counter.accurate = !alloc_tag_is_inaccurate(tag);
> > > > > +       data->counter.bytes = counters->bytes;
> > > > > +       data->counter.calls = counters->calls;
> > > > > +       data->counter.accurate = !alloc_tag_is_inaccurate(ct_to_alloc_tag(ct));
> > > > >  }
> > > > >
> > > > >  /*
> > > > > @@ -238,7 +249,9 @@ static int allocinfo_ioctl_get_content_id(struct seq_file *m, void __user *arg)
> > > > >   * Verifies whether a given codetag satisfies the active filtering criteria by
> > > > >   * matching its characteristics against the specified filter.
> > > > >   */
> > > > > -static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter)
> > > > > +static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter,
> > > > > +                          struct alloc_tag_counters *counters,
> > > > > +                          bool *fetched_counters)
> > > > >  {
> > > > >         if (!filter || !filter->mask)
> > > > >                 return true;
> > > > > @@ -265,6 +278,19 @@ static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter)
> > > > >             ct->lineno != filter->fields.lineno)
> > > > >                 return false;
> > > > >
> > > > > +       if (filter->mask & (ALLOCINFO_FILTER_MASK_MIN_SIZE | ALLOCINFO_FILTER_MASK_MAX_SIZE)) {
> > > > > +               if (!*fetched_counters) {
> > > > > +                       *counters = allocinfo_prefetch_counters(ct);
> > > > > +                       *fetched_counters = true;
> > > > > +               }
> > > > > +               if ((filter->mask & ALLOCINFO_FILTER_MASK_MIN_SIZE) &&
> > > > > +                   counters->bytes < filter->min_size)
> > > > > +                       return false;
> > > > > +               if ((filter->mask & ALLOCINFO_FILTER_MASK_MAX_SIZE) &&
> > > > > +                   counters->bytes > filter->max_size)
> > > > > +                       return false;
> > > > > +       }
> > > > > +
> > > > >         return true;
> > > > >  }
> > > > >
> > > > > @@ -278,6 +304,8 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > > > >         struct codetag *ct;
> > > > >         struct allocinfo_get_at params = {0};
> > > > >         __u64 skip_count;
> > > > > +       struct alloc_tag_counters counters;
> > > > > +       bool fetched_counters;
> > > > >
> > > > >         if (copy_from_user(&params, arg, sizeof(params)))
> > > > >                 return -EFAULT;
> > > > > @@ -285,6 +313,11 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > > > >         if (params.filter.mask & ~ALLOCINFO_FILTER_MASKS)
> > > > >                 return -EINVAL;
> > > > >
> > > > > +       if ((params.filter.mask & ALLOCINFO_FILTER_MASK_MIN_SIZE) &&
> > > > > +           (params.filter.mask & ALLOCINFO_FILTER_MASK_MAX_SIZE) &&
> > > > > +           params.filter.min_size > params.filter.max_size)
> > > > > +               return -EINVAL;
> > > > > +
> > > > >         priv = m->private;
> > > > >
> > > > >         mutex_lock(&priv->ioctl_lock);
> > > > > @@ -308,7 +341,8 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > > > >         ct = codetag_next_ct(&priv->ioctl_iter);
> > > > >
> > > > >         while (ct) {
> > > > > -               if (matches_filter(ct, &priv->filter)) {
> > > > > +               fetched_counters = false;
> > > > > +               if (matches_filter(ct, &priv->filter, &counters, &fetched_counters)) {
> > > >
> > > > Do we really need this "fetched_counters" parameter? Here are the
> > > > possible cases:
> > > > 1. If the filter does not include ALLOCINFO_FILTER_MASK_MIN_SIZE |
> > > > ALLOCINFO_FILTER_MASK_MAX_SIZE then counters would not be fetched.
> > > > 2. If the filter includes ALLOCINFO_FILTER_MASK_MIN_SIZE |
> > > > ALLOCINFO_FILTER_MASK_MAX_SIZE and
> > > > 2.1. matches_filter() returns true then we know counters were fetched
> > > > because they had to be validated.
> > > > 2.2. matches_filter() returns false then we don't care if the counters
> > > > were fetched. We do not report that tag anyway.
> > > >
> > > > So, instead of passing fetched_counters to matches_filter() we could do this:
> > > >
> > > > bool filter_by_size = (params.filter.mask &
> > > > (ALLOCINFO_FILTER_MASK_MIN_SIZE | ALLOCINFO_FILTER_MASK_MAX_SIZE)) !=
> > > > 0;
> > > > while (ct) {
> > > >            if (matches_filter(ct, &priv->filter, &counters)) {
> > > > ...
> > > > }
> > > > if (ct) {
> > > >            allocinfo_to_params(ct, &params.data, filter_by_size ?
> > > > &counters : NULL);
> > > > ...
> > > > }
> > > >
> > > > Wouldn't that work?
> > > >
> > >
> > > While we can deduce whether counters were fetched outside the
> > > matches_filter function, I think the current implementation is more
> > > intuitive from a readability perspective. I believe it  should be kept
> > > as is for that reason. If we extract the logic, we'll first have to
> > > replicate the boolean logic at two places. Second, we'd need to add a
> > > comment explaining the boolean calculation, and the reader might have
> > > a higher cognitive load trying to determine which function populates
> > > the counters. The current implementation makes it easy for the reader
> > > to deduce the original intention. Let me know what you think.
> >
> > Ok, I guess you have a point.
> >
> > I was also thinking why we are passing NULL to allocinfo_to_params()
> > to fetch the counters into a local variable? Why can't we simply call
> > allocinfo_prefetch_counters() before calling allocinfo_to_params()
> > when fetched_counters==false? Basically:
> >
> > if (!fetched_counters)
> >     counters = allocinfo_prefetch_counters(ct);
> > allocinfo_to_params(ct, &params.data, &counters);
> >
> > This would simplify allocinfo_to_params() because counter will never
> > be NULL and it would not need local counters.
> >
>
> The only reason I did it that way was to avoid repeating the code at
> two places i.e. allocinfo_ioctl_get_at and allocinfo_ioctl_get_next.
> Either way, the per-CPU counters are assimilated only once. I can
> include this change if you still want me to, but personally I like the
> way it currently is implemented.

Yeah, I think repeating 2 lines is preferable to passing NULL and
fetching into a local variable. Please include that change.

>
> > >
> > > > >                         if (skip_count == 0)
> > > > >                                 break;
> > > > >                         skip_count--;
> > > > > @@ -317,7 +351,7 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > > > >         }
> > > > >
> > > > >         if (ct) {
> > > > > -               allocinfo_to_params(ct, &params.data);
> > > > > +               allocinfo_to_params(ct, &params.data, fetched_counters ? &counters : NULL);
> > > > >                 priv->positioned = true;
> > > > >         }
> > > > >
> > > > > @@ -343,6 +377,8 @@ static int allocinfo_ioctl_get_next(struct seq_file *m, void __user *arg)
> > > > >         struct codetag *ct;
> > > > >         struct allocinfo_tag_data params;
> > > > >         int ret = 0;
> > > > > +       struct alloc_tag_counters counters;
> > > > > +       bool fetched_counters;
> > > > >
> > > > >         memset(&params, 0, sizeof(params));
> > > > >         priv = m->private;
> > > > > @@ -356,10 +392,15 @@ static int allocinfo_ioctl_get_next(struct seq_file *m, void __user *arg)
> > > > >         }
> > > > >
> > > > >         ct = codetag_next_ct(&priv->ioctl_iter);
> > > > > -       while (ct && !matches_filter(ct, &priv->filter))
> > > > > +       while (ct) {
> > > > > +               fetched_counters = false;
> > > > > +               if (matches_filter(ct, &priv->filter, &counters, &fetched_counters))
> > > > > +                       break;
> > > > >                 ct = codetag_next_ct(&priv->ioctl_iter);
> > > > > +       }
> > > > > +
> > > > >         if (ct)
> > > > > -               allocinfo_to_params(ct, &params);
> > > > > +               allocinfo_to_params(ct, &params, fetched_counters ? &counters : NULL);
> > > > >
> > > > >         if (!ct) {
> > > > >                 priv->positioned = false;
> > > > > --
> > > > > 2.54.0.1136.gdb2ca164c4-goog
> > > > >

^ permalink raw reply

* Re: [PATCH v5 3/6] alloc_tag: add size-based filtering to ioctl
From: Abhishek Bapat @ 2026-06-17 22:40 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, Kent Overstreet, Hao Ge, Shuah Khan,
	Jonathan Corbet, linux-doc, linux-kernel, linux-mm, Sourav Panda
In-Reply-To: <CAJuCfpGOrtk+3hvUVE7-6wpnsa3Nbr6kGq5CfHVdCzX+DYyjFQ@mail.gmail.com>

On Wed, Jun 17, 2026 at 3:35 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Jun 17, 2026 at 1:55 PM Abhishek Bapat <abhishekbapat@google.com> wrote:
> >
> > On Wed, Jun 17, 2026 at 9:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Mon, Jun 15, 2026 at 4:04 PM Abhishek Bapat <abhishekbapat@google.com> wrote:
> > > >
> > > > Extend the allocinfo filtering mechanism to allow users to filter tags
> > > > based on the total number of bytes allocated [min_size, max_size]. The
> > > > size range is inclusive.
> > > >
> > > > Filtering by size involves retrieving allocinfo per-CPU counters, which
> > > > is an expensive operation. Hence, the performance of size-based
> > > > filtering will be worse than other filters.
> > > >
> > > > Signed-off-by: Abhishek Bapat <abhishekbapat@google.com>
> > > > Acked-by: Hao Ge <hao.ge@linux.dev>
> > > > ---
> > > >  include/uapi/linux/alloc_tag.h |  8 ++++-
> > > >  lib/alloc_tag.c                | 63 ++++++++++++++++++++++++++++------
> > > >  2 files changed, 59 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/include/uapi/linux/alloc_tag.h b/include/uapi/linux/alloc_tag.h
> > > > index 3b11877955b9..7f5acbb44c14 100644
> > > > --- a/include/uapi/linux/alloc_tag.h
> > > > +++ b/include/uapi/linux/alloc_tag.h
> > > > @@ -45,13 +45,17 @@ enum {
> > > >         ALLOCINFO_FILTER_FUNCTION,
> > > >         ALLOCINFO_FILTER_FILENAME,
> > > >         ALLOCINFO_FILTER_LINENO,
> > > > -       __ALLOCINFO_FILTER_LAST = ALLOCINFO_FILTER_LINENO
> > > > +       ALLOCINFO_FILTER_MIN_SIZE,
> > > > +       ALLOCINFO_FILTER_MAX_SIZE,
> > > > +       __ALLOCINFO_FILTER_LAST = ALLOCINFO_FILTER_MAX_SIZE
> > > >  };
> > > >
> > > >  #define ALLOCINFO_FILTER_MASK_MODNAME          (1 << ALLOCINFO_FILTER_MODNAME)
> > > >  #define ALLOCINFO_FILTER_MASK_FUNCTION         (1 << ALLOCINFO_FILTER_FUNCTION)
> > > >  #define ALLOCINFO_FILTER_MASK_FILENAME         (1 << ALLOCINFO_FILTER_FILENAME)
> > > >  #define ALLOCINFO_FILTER_MASK_LINENO           (1 << ALLOCINFO_FILTER_LINENO)
> > > > +#define ALLOCINFO_FILTER_MASK_MIN_SIZE         (1 << ALLOCINFO_FILTER_MIN_SIZE)
> > > > +#define ALLOCINFO_FILTER_MASK_MAX_SIZE         (1 << ALLOCINFO_FILTER_MAX_SIZE)
> > > >
> > > >  #define ALLOCINFO_FILTER_MASKS \
> > > >         ((1 << (__ALLOCINFO_FILTER_LAST + 1)) - 1)
> > > > @@ -59,6 +63,8 @@ enum {
> > > >  struct allocinfo_filter {
> > > >         __u64 mask; /* bitmask of the filter fields used */
> > > >         struct allocinfo_tag fields;
> > > > +       __u64 min_size;
> > > > +       __u64 max_size;
> > > >  };
> > > >
> > > >  struct allocinfo_get_at {
> > > > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > > > index 5feb61d9fb92..b3d21834b61e 100644
> > > > --- a/lib/alloc_tag.c
> > > > +++ b/lib/alloc_tag.c
> > > > @@ -195,15 +195,26 @@ static int allocinfo_cmp_str(const char *str, const char *template)
> > > >         return strncmp(allocinfo_str(str), template, ALLOCINFO_STR_SIZE);
> > > >  }
> > > >
> > > > +/* Fetch the per-CPU counters */
> > > > +static inline struct alloc_tag_counters allocinfo_prefetch_counters(struct codetag *ct)
> > > > +{
> > > > +       return alloc_tag_read(ct_to_alloc_tag(ct));
> > > > +}
> > > > +
> > > >  /*
> > > >   * Populates the UAPI allocinfo_tag_data structure with active runtime
> > > >   * profiling counters extracted from the given kernel codetag.
> > > >   */
> > > >  static void allocinfo_to_params(struct codetag *ct,
> > > > -                               struct allocinfo_tag_data *data)
> > > > +                               struct allocinfo_tag_data *data,
> > > > +                               struct alloc_tag_counters *counters)
> > > >  {
> > > > -       struct alloc_tag *tag = ct_to_alloc_tag(ct);
> > > > -       struct alloc_tag_counters counter = alloc_tag_read(tag);
> > > > +       struct alloc_tag_counters local_counters;
> > > > +
> > > > +       if (!counters) {
> > > > +               local_counters = allocinfo_prefetch_counters(ct);
> > > > +               counters = &local_counters;
> > > > +       }
> > > >
> > > >         if (ct->modname)
> > > >                 allocinfo_copy_str(data->tag.modname, ct->modname);
> > > > @@ -212,9 +223,9 @@ static void allocinfo_to_params(struct codetag *ct,
> > > >         allocinfo_copy_str(data->tag.function, ct->function);
> > > >         allocinfo_copy_str(data->tag.filename, ct->filename);
> > > >         data->tag.lineno = ct->lineno;
> > > > -       data->counter.bytes = counter.bytes;
> > > > -       data->counter.calls = counter.calls;
> > > > -       data->counter.accurate = !alloc_tag_is_inaccurate(tag);
> > > > +       data->counter.bytes = counters->bytes;
> > > > +       data->counter.calls = counters->calls;
> > > > +       data->counter.accurate = !alloc_tag_is_inaccurate(ct_to_alloc_tag(ct));
> > > >  }
> > > >
> > > >  /*
> > > > @@ -238,7 +249,9 @@ static int allocinfo_ioctl_get_content_id(struct seq_file *m, void __user *arg)
> > > >   * Verifies whether a given codetag satisfies the active filtering criteria by
> > > >   * matching its characteristics against the specified filter.
> > > >   */
> > > > -static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter)
> > > > +static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter,
> > > > +                          struct alloc_tag_counters *counters,
> > > > +                          bool *fetched_counters)
> > > >  {
> > > >         if (!filter || !filter->mask)
> > > >                 return true;
> > > > @@ -265,6 +278,19 @@ static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter)
> > > >             ct->lineno != filter->fields.lineno)
> > > >                 return false;
> > > >
> > > > +       if (filter->mask & (ALLOCINFO_FILTER_MASK_MIN_SIZE | ALLOCINFO_FILTER_MASK_MAX_SIZE)) {
> > > > +               if (!*fetched_counters) {
> > > > +                       *counters = allocinfo_prefetch_counters(ct);
> > > > +                       *fetched_counters = true;
> > > > +               }
> > > > +               if ((filter->mask & ALLOCINFO_FILTER_MASK_MIN_SIZE) &&
> > > > +                   counters->bytes < filter->min_size)
> > > > +                       return false;
> > > > +               if ((filter->mask & ALLOCINFO_FILTER_MASK_MAX_SIZE) &&
> > > > +                   counters->bytes > filter->max_size)
> > > > +                       return false;
> > > > +       }
> > > > +
> > > >         return true;
> > > >  }
> > > >
> > > > @@ -278,6 +304,8 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > > >         struct codetag *ct;
> > > >         struct allocinfo_get_at params = {0};
> > > >         __u64 skip_count;
> > > > +       struct alloc_tag_counters counters;
> > > > +       bool fetched_counters;
> > > >
> > > >         if (copy_from_user(&params, arg, sizeof(params)))
> > > >                 return -EFAULT;
> > > > @@ -285,6 +313,11 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > > >         if (params.filter.mask & ~ALLOCINFO_FILTER_MASKS)
> > > >                 return -EINVAL;
> > > >
> > > > +       if ((params.filter.mask & ALLOCINFO_FILTER_MASK_MIN_SIZE) &&
> > > > +           (params.filter.mask & ALLOCINFO_FILTER_MASK_MAX_SIZE) &&
> > > > +           params.filter.min_size > params.filter.max_size)
> > > > +               return -EINVAL;
> > > > +
> > > >         priv = m->private;
> > > >
> > > >         mutex_lock(&priv->ioctl_lock);
> > > > @@ -308,7 +341,8 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > > >         ct = codetag_next_ct(&priv->ioctl_iter);
> > > >
> > > >         while (ct) {
> > > > -               if (matches_filter(ct, &priv->filter)) {
> > > > +               fetched_counters = false;
> > > > +               if (matches_filter(ct, &priv->filter, &counters, &fetched_counters)) {
> > >
> > > Do we really need this "fetched_counters" parameter? Here are the
> > > possible cases:
> > > 1. If the filter does not include ALLOCINFO_FILTER_MASK_MIN_SIZE |
> > > ALLOCINFO_FILTER_MASK_MAX_SIZE then counters would not be fetched.
> > > 2. If the filter includes ALLOCINFO_FILTER_MASK_MIN_SIZE |
> > > ALLOCINFO_FILTER_MASK_MAX_SIZE and
> > > 2.1. matches_filter() returns true then we know counters were fetched
> > > because they had to be validated.
> > > 2.2. matches_filter() returns false then we don't care if the counters
> > > were fetched. We do not report that tag anyway.
> > >
> > > So, instead of passing fetched_counters to matches_filter() we could do this:
> > >
> > > bool filter_by_size = (params.filter.mask &
> > > (ALLOCINFO_FILTER_MASK_MIN_SIZE | ALLOCINFO_FILTER_MASK_MAX_SIZE)) !=
> > > 0;
> > > while (ct) {
> > >            if (matches_filter(ct, &priv->filter, &counters)) {
> > > ...
> > > }
> > > if (ct) {
> > >            allocinfo_to_params(ct, &params.data, filter_by_size ?
> > > &counters : NULL);
> > > ...
> > > }
> > >
> > > Wouldn't that work?
> > >
> >
> > While we can deduce whether counters were fetched outside the
> > matches_filter function, I think the current implementation is more
> > intuitive from a readability perspective. I believe it  should be kept
> > as is for that reason. If we extract the logic, we'll first have to
> > replicate the boolean logic at two places. Second, we'd need to add a
> > comment explaining the boolean calculation, and the reader might have
> > a higher cognitive load trying to determine which function populates
> > the counters. The current implementation makes it easy for the reader
> > to deduce the original intention. Let me know what you think.
>
> Ok, I guess you have a point.
>
> I was also thinking why we are passing NULL to allocinfo_to_params()
> to fetch the counters into a local variable? Why can't we simply call
> allocinfo_prefetch_counters() before calling allocinfo_to_params()
> when fetched_counters==false? Basically:
>
> if (!fetched_counters)
>     counters = allocinfo_prefetch_counters(ct);
> allocinfo_to_params(ct, &params.data, &counters);
>
> This would simplify allocinfo_to_params() because counter will never
> be NULL and it would not need local counters.
>

The only reason I did it that way was to avoid repeating the code at
two places i.e. allocinfo_ioctl_get_at and allocinfo_ioctl_get_next.
Either way, the per-CPU counters are assimilated only once. I can
include this change if you still want me to, but personally I like the
way it currently is implemented.

> >
> > > >                         if (skip_count == 0)
> > > >                                 break;
> > > >                         skip_count--;
> > > > @@ -317,7 +351,7 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > > >         }
> > > >
> > > >         if (ct) {
> > > > -               allocinfo_to_params(ct, &params.data);
> > > > +               allocinfo_to_params(ct, &params.data, fetched_counters ? &counters : NULL);
> > > >                 priv->positioned = true;
> > > >         }
> > > >
> > > > @@ -343,6 +377,8 @@ static int allocinfo_ioctl_get_next(struct seq_file *m, void __user *arg)
> > > >         struct codetag *ct;
> > > >         struct allocinfo_tag_data params;
> > > >         int ret = 0;
> > > > +       struct alloc_tag_counters counters;
> > > > +       bool fetched_counters;
> > > >
> > > >         memset(&params, 0, sizeof(params));
> > > >         priv = m->private;
> > > > @@ -356,10 +392,15 @@ static int allocinfo_ioctl_get_next(struct seq_file *m, void __user *arg)
> > > >         }
> > > >
> > > >         ct = codetag_next_ct(&priv->ioctl_iter);
> > > > -       while (ct && !matches_filter(ct, &priv->filter))
> > > > +       while (ct) {
> > > > +               fetched_counters = false;
> > > > +               if (matches_filter(ct, &priv->filter, &counters, &fetched_counters))
> > > > +                       break;
> > > >                 ct = codetag_next_ct(&priv->ioctl_iter);
> > > > +       }
> > > > +
> > > >         if (ct)
> > > > -               allocinfo_to_params(ct, &params);
> > > > +               allocinfo_to_params(ct, &params, fetched_counters ? &counters : NULL);
> > > >
> > > >         if (!ct) {
> > > >                 priv->positioned = false;
> > > > --
> > > > 2.54.0.1136.gdb2ca164c4-goog
> > > >

^ permalink raw reply

* Re: [PATCH v5 3/6] alloc_tag: add size-based filtering to ioctl
From: Suren Baghdasaryan @ 2026-06-17 22:34 UTC (permalink / raw)
  To: Abhishek Bapat
  Cc: Andrew Morton, Kent Overstreet, Hao Ge, Shuah Khan,
	Jonathan Corbet, linux-doc, linux-kernel, linux-mm, Sourav Panda
In-Reply-To: <CAL41Mv7=B7H1C3j5_Pva-kYsJs_1NpCqVhN6wn-WhqvquV6=2w@mail.gmail.com>

On Wed, Jun 17, 2026 at 1:55 PM Abhishek Bapat <abhishekbapat@google.com> wrote:
>
> On Wed, Jun 17, 2026 at 9:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Mon, Jun 15, 2026 at 4:04 PM Abhishek Bapat <abhishekbapat@google.com> wrote:
> > >
> > > Extend the allocinfo filtering mechanism to allow users to filter tags
> > > based on the total number of bytes allocated [min_size, max_size]. The
> > > size range is inclusive.
> > >
> > > Filtering by size involves retrieving allocinfo per-CPU counters, which
> > > is an expensive operation. Hence, the performance of size-based
> > > filtering will be worse than other filters.
> > >
> > > Signed-off-by: Abhishek Bapat <abhishekbapat@google.com>
> > > Acked-by: Hao Ge <hao.ge@linux.dev>
> > > ---
> > >  include/uapi/linux/alloc_tag.h |  8 ++++-
> > >  lib/alloc_tag.c                | 63 ++++++++++++++++++++++++++++------
> > >  2 files changed, 59 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/alloc_tag.h b/include/uapi/linux/alloc_tag.h
> > > index 3b11877955b9..7f5acbb44c14 100644
> > > --- a/include/uapi/linux/alloc_tag.h
> > > +++ b/include/uapi/linux/alloc_tag.h
> > > @@ -45,13 +45,17 @@ enum {
> > >         ALLOCINFO_FILTER_FUNCTION,
> > >         ALLOCINFO_FILTER_FILENAME,
> > >         ALLOCINFO_FILTER_LINENO,
> > > -       __ALLOCINFO_FILTER_LAST = ALLOCINFO_FILTER_LINENO
> > > +       ALLOCINFO_FILTER_MIN_SIZE,
> > > +       ALLOCINFO_FILTER_MAX_SIZE,
> > > +       __ALLOCINFO_FILTER_LAST = ALLOCINFO_FILTER_MAX_SIZE
> > >  };
> > >
> > >  #define ALLOCINFO_FILTER_MASK_MODNAME          (1 << ALLOCINFO_FILTER_MODNAME)
> > >  #define ALLOCINFO_FILTER_MASK_FUNCTION         (1 << ALLOCINFO_FILTER_FUNCTION)
> > >  #define ALLOCINFO_FILTER_MASK_FILENAME         (1 << ALLOCINFO_FILTER_FILENAME)
> > >  #define ALLOCINFO_FILTER_MASK_LINENO           (1 << ALLOCINFO_FILTER_LINENO)
> > > +#define ALLOCINFO_FILTER_MASK_MIN_SIZE         (1 << ALLOCINFO_FILTER_MIN_SIZE)
> > > +#define ALLOCINFO_FILTER_MASK_MAX_SIZE         (1 << ALLOCINFO_FILTER_MAX_SIZE)
> > >
> > >  #define ALLOCINFO_FILTER_MASKS \
> > >         ((1 << (__ALLOCINFO_FILTER_LAST + 1)) - 1)
> > > @@ -59,6 +63,8 @@ enum {
> > >  struct allocinfo_filter {
> > >         __u64 mask; /* bitmask of the filter fields used */
> > >         struct allocinfo_tag fields;
> > > +       __u64 min_size;
> > > +       __u64 max_size;
> > >  };
> > >
> > >  struct allocinfo_get_at {
> > > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > > index 5feb61d9fb92..b3d21834b61e 100644
> > > --- a/lib/alloc_tag.c
> > > +++ b/lib/alloc_tag.c
> > > @@ -195,15 +195,26 @@ static int allocinfo_cmp_str(const char *str, const char *template)
> > >         return strncmp(allocinfo_str(str), template, ALLOCINFO_STR_SIZE);
> > >  }
> > >
> > > +/* Fetch the per-CPU counters */
> > > +static inline struct alloc_tag_counters allocinfo_prefetch_counters(struct codetag *ct)
> > > +{
> > > +       return alloc_tag_read(ct_to_alloc_tag(ct));
> > > +}
> > > +
> > >  /*
> > >   * Populates the UAPI allocinfo_tag_data structure with active runtime
> > >   * profiling counters extracted from the given kernel codetag.
> > >   */
> > >  static void allocinfo_to_params(struct codetag *ct,
> > > -                               struct allocinfo_tag_data *data)
> > > +                               struct allocinfo_tag_data *data,
> > > +                               struct alloc_tag_counters *counters)
> > >  {
> > > -       struct alloc_tag *tag = ct_to_alloc_tag(ct);
> > > -       struct alloc_tag_counters counter = alloc_tag_read(tag);
> > > +       struct alloc_tag_counters local_counters;
> > > +
> > > +       if (!counters) {
> > > +               local_counters = allocinfo_prefetch_counters(ct);
> > > +               counters = &local_counters;
> > > +       }
> > >
> > >         if (ct->modname)
> > >                 allocinfo_copy_str(data->tag.modname, ct->modname);
> > > @@ -212,9 +223,9 @@ static void allocinfo_to_params(struct codetag *ct,
> > >         allocinfo_copy_str(data->tag.function, ct->function);
> > >         allocinfo_copy_str(data->tag.filename, ct->filename);
> > >         data->tag.lineno = ct->lineno;
> > > -       data->counter.bytes = counter.bytes;
> > > -       data->counter.calls = counter.calls;
> > > -       data->counter.accurate = !alloc_tag_is_inaccurate(tag);
> > > +       data->counter.bytes = counters->bytes;
> > > +       data->counter.calls = counters->calls;
> > > +       data->counter.accurate = !alloc_tag_is_inaccurate(ct_to_alloc_tag(ct));
> > >  }
> > >
> > >  /*
> > > @@ -238,7 +249,9 @@ static int allocinfo_ioctl_get_content_id(struct seq_file *m, void __user *arg)
> > >   * Verifies whether a given codetag satisfies the active filtering criteria by
> > >   * matching its characteristics against the specified filter.
> > >   */
> > > -static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter)
> > > +static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter,
> > > +                          struct alloc_tag_counters *counters,
> > > +                          bool *fetched_counters)
> > >  {
> > >         if (!filter || !filter->mask)
> > >                 return true;
> > > @@ -265,6 +278,19 @@ static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter)
> > >             ct->lineno != filter->fields.lineno)
> > >                 return false;
> > >
> > > +       if (filter->mask & (ALLOCINFO_FILTER_MASK_MIN_SIZE | ALLOCINFO_FILTER_MASK_MAX_SIZE)) {
> > > +               if (!*fetched_counters) {
> > > +                       *counters = allocinfo_prefetch_counters(ct);
> > > +                       *fetched_counters = true;
> > > +               }
> > > +               if ((filter->mask & ALLOCINFO_FILTER_MASK_MIN_SIZE) &&
> > > +                   counters->bytes < filter->min_size)
> > > +                       return false;
> > > +               if ((filter->mask & ALLOCINFO_FILTER_MASK_MAX_SIZE) &&
> > > +                   counters->bytes > filter->max_size)
> > > +                       return false;
> > > +       }
> > > +
> > >         return true;
> > >  }
> > >
> > > @@ -278,6 +304,8 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > >         struct codetag *ct;
> > >         struct allocinfo_get_at params = {0};
> > >         __u64 skip_count;
> > > +       struct alloc_tag_counters counters;
> > > +       bool fetched_counters;
> > >
> > >         if (copy_from_user(&params, arg, sizeof(params)))
> > >                 return -EFAULT;
> > > @@ -285,6 +313,11 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > >         if (params.filter.mask & ~ALLOCINFO_FILTER_MASKS)
> > >                 return -EINVAL;
> > >
> > > +       if ((params.filter.mask & ALLOCINFO_FILTER_MASK_MIN_SIZE) &&
> > > +           (params.filter.mask & ALLOCINFO_FILTER_MASK_MAX_SIZE) &&
> > > +           params.filter.min_size > params.filter.max_size)
> > > +               return -EINVAL;
> > > +
> > >         priv = m->private;
> > >
> > >         mutex_lock(&priv->ioctl_lock);
> > > @@ -308,7 +341,8 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > >         ct = codetag_next_ct(&priv->ioctl_iter);
> > >
> > >         while (ct) {
> > > -               if (matches_filter(ct, &priv->filter)) {
> > > +               fetched_counters = false;
> > > +               if (matches_filter(ct, &priv->filter, &counters, &fetched_counters)) {
> >
> > Do we really need this "fetched_counters" parameter? Here are the
> > possible cases:
> > 1. If the filter does not include ALLOCINFO_FILTER_MASK_MIN_SIZE |
> > ALLOCINFO_FILTER_MASK_MAX_SIZE then counters would not be fetched.
> > 2. If the filter includes ALLOCINFO_FILTER_MASK_MIN_SIZE |
> > ALLOCINFO_FILTER_MASK_MAX_SIZE and
> > 2.1. matches_filter() returns true then we know counters were fetched
> > because they had to be validated.
> > 2.2. matches_filter() returns false then we don't care if the counters
> > were fetched. We do not report that tag anyway.
> >
> > So, instead of passing fetched_counters to matches_filter() we could do this:
> >
> > bool filter_by_size = (params.filter.mask &
> > (ALLOCINFO_FILTER_MASK_MIN_SIZE | ALLOCINFO_FILTER_MASK_MAX_SIZE)) !=
> > 0;
> > while (ct) {
> >            if (matches_filter(ct, &priv->filter, &counters)) {
> > ...
> > }
> > if (ct) {
> >            allocinfo_to_params(ct, &params.data, filter_by_size ?
> > &counters : NULL);
> > ...
> > }
> >
> > Wouldn't that work?
> >
>
> While we can deduce whether counters were fetched outside the
> matches_filter function, I think the current implementation is more
> intuitive from a readability perspective. I believe it  should be kept
> as is for that reason. If we extract the logic, we'll first have to
> replicate the boolean logic at two places. Second, we'd need to add a
> comment explaining the boolean calculation, and the reader might have
> a higher cognitive load trying to determine which function populates
> the counters. The current implementation makes it easy for the reader
> to deduce the original intention. Let me know what you think.

Ok, I guess you have a point.

I was also thinking why we are passing NULL to allocinfo_to_params()
to fetch the counters into a local variable? Why can't we simply call
allocinfo_prefetch_counters() before calling allocinfo_to_params()
when fetched_counters==false? Basically:

if (!fetched_counters)
    counters = allocinfo_prefetch_counters(ct);
allocinfo_to_params(ct, &params.data, &counters);

This would simplify allocinfo_to_params() because counter will never
be NULL and it would not need local counters.

>
> > >                         if (skip_count == 0)
> > >                                 break;
> > >                         skip_count--;
> > > @@ -317,7 +351,7 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> > >         }
> > >
> > >         if (ct) {
> > > -               allocinfo_to_params(ct, &params.data);
> > > +               allocinfo_to_params(ct, &params.data, fetched_counters ? &counters : NULL);
> > >                 priv->positioned = true;
> > >         }
> > >
> > > @@ -343,6 +377,8 @@ static int allocinfo_ioctl_get_next(struct seq_file *m, void __user *arg)
> > >         struct codetag *ct;
> > >         struct allocinfo_tag_data params;
> > >         int ret = 0;
> > > +       struct alloc_tag_counters counters;
> > > +       bool fetched_counters;
> > >
> > >         memset(&params, 0, sizeof(params));
> > >         priv = m->private;
> > > @@ -356,10 +392,15 @@ static int allocinfo_ioctl_get_next(struct seq_file *m, void __user *arg)
> > >         }
> > >
> > >         ct = codetag_next_ct(&priv->ioctl_iter);
> > > -       while (ct && !matches_filter(ct, &priv->filter))
> > > +       while (ct) {
> > > +               fetched_counters = false;
> > > +               if (matches_filter(ct, &priv->filter, &counters, &fetched_counters))
> > > +                       break;
> > >                 ct = codetag_next_ct(&priv->ioctl_iter);
> > > +       }
> > > +
> > >         if (ct)
> > > -               allocinfo_to_params(ct, &params);
> > > +               allocinfo_to_params(ct, &params, fetched_counters ? &counters : NULL);
> > >
> > >         if (!ct) {
> > >                 priv->positioned = false;
> > > --
> > > 2.54.0.1136.gdb2ca164c4-goog
> > >

^ permalink raw reply

* Re: [PATCH v2 05/11] hugetlb: Convert the vmf->pgoff to PAGE_SIZE granularity
From: Matthew Wilcox @ 2026-06-17 22:28 UTC (permalink / raw)
  To: Jane Chu
  Cc: akpm, jack, viro, brauner, muchun.song, osalvador, david, hughd,
	baolin.wang, linmiaohe, nao.horiguchi, lorenzo, rppt, peterx,
	corbet, linux-doc, linux-mm, linux-kernel, linux-fsdevel
In-Reply-To: <20260617172534.1740152-6-jane.chu@oracle.com>

On Wed, Jun 17, 2026 at 11:25:26AM -0600, Jane Chu wrote:
> +++ b/mm/hugetlb.c
> @@ -5654,6 +5654,8 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_fault *vmf,
>  						  unsigned long reason)
>  {
>  	u32 hash;
> +	struct hstate *h = hstate_vma(vmf->vma);
> +	pgoff_t idx = vmf->pgoff >> huge_page_order(h);

If we do manage to make mapping_min_folio_nrpages() return the right
answer for hugetlbfs (see earlier comment), then we can avoid this by doing:

+++ b/mm/hugetlb.c
@@ -5936,7 +5936,7 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx)
        u32 hash;

        key[0] = (unsigned long) mapping;
-       key[1] = idx;
+       key[1] = idx >> mapping_min_folio_order(mapping);

        hash = jhash2((u32 *)&key, sizeof(key)/(sizeof(u32)), 0);

although I wonder if we still need the fault mutex array, given that we
now have mapping->invalidate_lock?


^ permalink raw reply

* Re: [PATCH v6 03/12] PCI: liveupdate: Track incoming preserved PCI devices
From: David Matlack @ 2026-06-17 22:07 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
	Adithya Jayachandran, Alexander Graf, Alex Williamson,
	Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
	Lukas Wunner, Mike Rapoport, Parav Pandit, Pranjal Shrivastava,
	Pratyush Yadav, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <ajBgj_aSuzMZG47e@google.com>

On Mon, Jun 15, 2026 at 1:29 PM David Matlack <dmatlack@google.com> wrote:
> On 2026-06-14 01:38 PM, Pasha Tatashin wrote:
> > On Fri, 22 May 2026 20:24:01 +0000, David Matlack <dmatlack@google.com> wrote:

> > > +static struct pci_flb_incoming *pci_liveupdate_flb_get_incoming(void)
> > > +{
> > > +   struct pci_flb_incoming *incoming = NULL;
> > > +   int ret;
> >
> > Maybe make the error return static, and avoid another search through compatible
> > FLBs if it failed before?
>
> Good idea, will do.

Actually I'm not so sure this should be handled her. I would have to
create a statically allocated variable to cache the result, like you
said, and I would also have to invalidate it during FLB finish to
avoid use-after-free. That defeats one of the main benefits of FLB
which is that we don't have to manage global variables.

Can this be handled by LUO instead?

^ permalink raw reply

* Re: [PATCH v6 01/12] PCI: liveupdate: Set up FLB handler for the PCI core
From: Pasha Tatashin @ 2026-06-17 21:44 UTC (permalink / raw)
  To: David Matlack
  Cc: Pasha Tatashin, kexec, linux-doc, linux-kernel, linux-mm,
	linux-pci, Adithya Jayachandran, Alexander Graf, Alex Williamson,
	Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
	Lukas Wunner, Mike Rapoport, Parav Pandit, Pranjal Shrivastava,
	Pratyush Yadav, Saeed Mahameed, Samiullah Khawaja, Shuah Khan,
	Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <ajB7EA4tAKqj5XV0@google.com>

On 2026-06-15 22:22:08+00:00, David Matlack wrote:
> On 2026-06-12 05:15 AM, Pasha Tatashin wrote:
> 
> > On Fri, 22 May 2026 20:23:59 +0000, David Matlack <dmatlack@google.com> wrote:
> 
> 
> 
> > > + * PCI device preservation across Live Update is built on top of the Live Update
> > 
> > I prefer to just use acronyms FLB, and LUO, but have links to the actual 
> > documentations about them.
> > 
> > So, something like this:
> > 
> >   * :ref:`FLB <flb>` Data
> >   * =====================
> >   *
> >   * PCI device preservation across Live Update is built on top of the
> >   * :ref:`LUO <luo>` support for file preservation across kexec. Drivers
> > 
> > And also add _luo and _flb to Documentation/core-api/liveupdate.rst
> > 
> > .. _luo:
> > 
> >  ========================
> >  Live Update Orchestrator
> >  ========================
> > 
> > .. _flb:
> 
> Will do.
> 
> I guess I will need to add another patch to add the link references to
> liveupdate.rst?

Yes, it can be a separate patch, but adding to this patch is also, OK.

> >  LUO File Lifecycle Bound Global Data
> >  ====================================
> > 
> > 
> > Nit, may be:
> 
> Did you have a suggestion here that got lost?

Yeah, I meant:
#define pr_fmt(fmt) "PCI: " KBUILD_BASENAME ": " fmt

> 
> > Please sort alphabetically.
> 
> Will do.
> 
> > I think, we want to use kho_block [1] (it is in liveupdate/next branch) 
> > to allow number of supported devices to be dynamic.
> > 
> > To support this, we would redefine the ABI and tracking structures like 
> > so:
> > 
> > /* include/linux/kho/abi/pci.h */
> > struct pci_ser {
> > 	u64 devices;      /* Phys address of the first block header of kho_block_set */
> > 	u64 nr_devices;   /* Total count of active preserved devices */
> > } __packed;
> > 
> > /* drivers/pci/liveupdate.c */
> > struct pci_flb_outgoing {
> > 	struct pci_ser *ser;            /* Points to the FDT/KHO-allocated ABI struct */
> > 	struct kho_block_set block_set;  /* Controls the active blocks on the fly */
> > };
> > 
> > In  __pci_liveupdate_preserve_device() , we would search for 
> > and reuse any inactive  pci_dev_ser  slot first, and only call 
> > kho_block_set_grow() to expand if no inactive slots are available.
> > 
> > In pci_liveupdate_unpreserve_device(), we would simply 
> > mark the  pci_dev_ser as inactive.
> 
> Makes sense at a high level. I'll work on switching kho_block for v7 and
> get back to you if I hit any issues.



^ permalink raw reply

* Re: [PATCH 00/19] init: discoverable root partitions, a.k.a. an omittable "root=" cmdline option
From: Vincent Mailhol @ 2026-06-17 20:56 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jens Axboe, Davidlohr Bueso, Alexander Viro, Jan Kara,
	linux-kernel, linux-block, linux-efi, linux-fsdevel,
	Richard Henderson, Matt Turner, Magnus Lindholm, linux-alpha,
	Vineet Gupta, linux-snps-arc, Russell King, linux-arm-kernel,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui, loongarch,
	Thomas Bogendoerfer, linux-mips, James E.J. Bottomley,
	Helge Deller, linux-parisc, Madhavan Srinivasan, Michael Ellerman,
	linuxppc-dev, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	linux-riscv, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	linux-s390, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <20260617-irritation-rollen-wirst-7d636cbfec92@brauner>

On 17/06/2026 at 14:41, Christian Brauner wrote:
> On Mon, Jun 15, 2026 at 06:08:56PM +0200, Vincent Mailhol wrote:
>> DPS [1] defines GPT partition type UUIDs for OS partitions and
>> attributes that control whether such partitions should be
>> automatically discovered. The specification states that:
>>
>>   The OS can discover and mount the necessary file systems with a
>>   non-existent or incomplete /etc/fstab file and without the root=
>>   kernel command line option.
>>
>> DPS is already implemented in systemd-gpt-auto-generator [2], which,
>> when embedded in an initrd, indeed allows automatic detection of the
>> root filesystem through its partition type UUID.
>>
>> This series adds this discovery feature directly into the kernel so
>> that people who are not using systemd or not using an initrd can still
>> benefit from it. The implementation follows the same model as
>> systemd-gpt-auto-generator:
> 
> I happen to co-maintain the DPS. It is userspace policy and complex
> userspace policy at that and does not belong into the kernel.
> 
> This also implements a really tiny portion of the spec. It deals with a
> lot more complex concepts such as automatic partitioning during
> installation, verity, LUKS, containers. This is really not intended for
> the kernel at all. I mean, it's great that this spec is being used but I
> do not want this in the kernel just for the sake of auto-discovery.

The implementation of a tiny portion is voluntary. If I can draw a
parallel, it would be the same as saying that the root= cmdline option
is a tiny portion of what an fstab can do.

Yes it does not manage the LUKS, containers and so on, the same way it
is not possible to directly boot those things directly from the kernel.

So, I don't think this conflicts with the actual userland
implementations, the same way you can add root= to your command line and
still have an initrd next to it.

I did not intend to write this as a replacement but just as a complement
to fill the gap of kernel with no initrd.

> The DPS is completely generic and can be implemented by tooling other
> than systemd (util-linux implements it and so does refind iirc). I think
> not wanting to use or build alternative userspace tooling for this is a
> really weak argument for pushing this into the kernel.

Well, I might explain to you where I come from. Time to time, I mess up
my configuration. When this issue is in a userland config file (e.g. bad
fstab), the recovery is always easy.

But when I mess up the bootloader firmware configuration (e.g. grub,
u-boot, edk2), the fix is always painful. I have to fight with a shell
with which I am not familiar with to figure out what the correct
configuration is.

And an initrd would help but:

 - it is still one more file to look for pass as a parameter
 - on some machine I do not have one anyway

I think it would have been very neet to have a method to boot a kernel
with zero config (understand here: no cmdline, no initrd) and I find out
that DPS could achieve that if just a tiny part of it were implemented
in the kernel.

For example, in edk2, I would be able to just browse the disk from the
"Boot from file" menu and select a kernel. Currently it panics because
no configuration is attached. With DPS, we could have it boot linux from
that menu. All in a graphical interface, with just up/down arrows and
one enter keypress.

And this is my motivation. This non LUKS root read-only part of the DPS
is the only piece which makes sense for me in the kernel. Not that I
don't *want* to implement it in userland, but just that it doesn't
achieve what would be helpful to me (and I guess others).

I thought I wouldn't be the only one in the world to see value in that
this is why I posted it.


Yours sincerely,
Vincent Mailhol


^ permalink raw reply

* Re: [PATCH v5 3/6] alloc_tag: add size-based filtering to ioctl
From: Abhishek Bapat @ 2026-06-17 20:55 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, Kent Overstreet, Hao Ge, Shuah Khan,
	Jonathan Corbet, linux-doc, linux-kernel, linux-mm, Sourav Panda
In-Reply-To: <CAJuCfpFbCKc7FR3tbeCzexCzyfiP+Ab2fJ7Vd1Q76MKTvqC2XA@mail.gmail.com>

On Wed, Jun 17, 2026 at 9:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Jun 15, 2026 at 4:04 PM Abhishek Bapat <abhishekbapat@google.com> wrote:
> >
> > Extend the allocinfo filtering mechanism to allow users to filter tags
> > based on the total number of bytes allocated [min_size, max_size]. The
> > size range is inclusive.
> >
> > Filtering by size involves retrieving allocinfo per-CPU counters, which
> > is an expensive operation. Hence, the performance of size-based
> > filtering will be worse than other filters.
> >
> > Signed-off-by: Abhishek Bapat <abhishekbapat@google.com>
> > Acked-by: Hao Ge <hao.ge@linux.dev>
> > ---
> >  include/uapi/linux/alloc_tag.h |  8 ++++-
> >  lib/alloc_tag.c                | 63 ++++++++++++++++++++++++++++------
> >  2 files changed, 59 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/uapi/linux/alloc_tag.h b/include/uapi/linux/alloc_tag.h
> > index 3b11877955b9..7f5acbb44c14 100644
> > --- a/include/uapi/linux/alloc_tag.h
> > +++ b/include/uapi/linux/alloc_tag.h
> > @@ -45,13 +45,17 @@ enum {
> >         ALLOCINFO_FILTER_FUNCTION,
> >         ALLOCINFO_FILTER_FILENAME,
> >         ALLOCINFO_FILTER_LINENO,
> > -       __ALLOCINFO_FILTER_LAST = ALLOCINFO_FILTER_LINENO
> > +       ALLOCINFO_FILTER_MIN_SIZE,
> > +       ALLOCINFO_FILTER_MAX_SIZE,
> > +       __ALLOCINFO_FILTER_LAST = ALLOCINFO_FILTER_MAX_SIZE
> >  };
> >
> >  #define ALLOCINFO_FILTER_MASK_MODNAME          (1 << ALLOCINFO_FILTER_MODNAME)
> >  #define ALLOCINFO_FILTER_MASK_FUNCTION         (1 << ALLOCINFO_FILTER_FUNCTION)
> >  #define ALLOCINFO_FILTER_MASK_FILENAME         (1 << ALLOCINFO_FILTER_FILENAME)
> >  #define ALLOCINFO_FILTER_MASK_LINENO           (1 << ALLOCINFO_FILTER_LINENO)
> > +#define ALLOCINFO_FILTER_MASK_MIN_SIZE         (1 << ALLOCINFO_FILTER_MIN_SIZE)
> > +#define ALLOCINFO_FILTER_MASK_MAX_SIZE         (1 << ALLOCINFO_FILTER_MAX_SIZE)
> >
> >  #define ALLOCINFO_FILTER_MASKS \
> >         ((1 << (__ALLOCINFO_FILTER_LAST + 1)) - 1)
> > @@ -59,6 +63,8 @@ enum {
> >  struct allocinfo_filter {
> >         __u64 mask; /* bitmask of the filter fields used */
> >         struct allocinfo_tag fields;
> > +       __u64 min_size;
> > +       __u64 max_size;
> >  };
> >
> >  struct allocinfo_get_at {
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index 5feb61d9fb92..b3d21834b61e 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -195,15 +195,26 @@ static int allocinfo_cmp_str(const char *str, const char *template)
> >         return strncmp(allocinfo_str(str), template, ALLOCINFO_STR_SIZE);
> >  }
> >
> > +/* Fetch the per-CPU counters */
> > +static inline struct alloc_tag_counters allocinfo_prefetch_counters(struct codetag *ct)
> > +{
> > +       return alloc_tag_read(ct_to_alloc_tag(ct));
> > +}
> > +
> >  /*
> >   * Populates the UAPI allocinfo_tag_data structure with active runtime
> >   * profiling counters extracted from the given kernel codetag.
> >   */
> >  static void allocinfo_to_params(struct codetag *ct,
> > -                               struct allocinfo_tag_data *data)
> > +                               struct allocinfo_tag_data *data,
> > +                               struct alloc_tag_counters *counters)
> >  {
> > -       struct alloc_tag *tag = ct_to_alloc_tag(ct);
> > -       struct alloc_tag_counters counter = alloc_tag_read(tag);
> > +       struct alloc_tag_counters local_counters;
> > +
> > +       if (!counters) {
> > +               local_counters = allocinfo_prefetch_counters(ct);
> > +               counters = &local_counters;
> > +       }
> >
> >         if (ct->modname)
> >                 allocinfo_copy_str(data->tag.modname, ct->modname);
> > @@ -212,9 +223,9 @@ static void allocinfo_to_params(struct codetag *ct,
> >         allocinfo_copy_str(data->tag.function, ct->function);
> >         allocinfo_copy_str(data->tag.filename, ct->filename);
> >         data->tag.lineno = ct->lineno;
> > -       data->counter.bytes = counter.bytes;
> > -       data->counter.calls = counter.calls;
> > -       data->counter.accurate = !alloc_tag_is_inaccurate(tag);
> > +       data->counter.bytes = counters->bytes;
> > +       data->counter.calls = counters->calls;
> > +       data->counter.accurate = !alloc_tag_is_inaccurate(ct_to_alloc_tag(ct));
> >  }
> >
> >  /*
> > @@ -238,7 +249,9 @@ static int allocinfo_ioctl_get_content_id(struct seq_file *m, void __user *arg)
> >   * Verifies whether a given codetag satisfies the active filtering criteria by
> >   * matching its characteristics against the specified filter.
> >   */
> > -static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter)
> > +static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter,
> > +                          struct alloc_tag_counters *counters,
> > +                          bool *fetched_counters)
> >  {
> >         if (!filter || !filter->mask)
> >                 return true;
> > @@ -265,6 +278,19 @@ static bool matches_filter(struct codetag *ct, struct allocinfo_filter *filter)
> >             ct->lineno != filter->fields.lineno)
> >                 return false;
> >
> > +       if (filter->mask & (ALLOCINFO_FILTER_MASK_MIN_SIZE | ALLOCINFO_FILTER_MASK_MAX_SIZE)) {
> > +               if (!*fetched_counters) {
> > +                       *counters = allocinfo_prefetch_counters(ct);
> > +                       *fetched_counters = true;
> > +               }
> > +               if ((filter->mask & ALLOCINFO_FILTER_MASK_MIN_SIZE) &&
> > +                   counters->bytes < filter->min_size)
> > +                       return false;
> > +               if ((filter->mask & ALLOCINFO_FILTER_MASK_MAX_SIZE) &&
> > +                   counters->bytes > filter->max_size)
> > +                       return false;
> > +       }
> > +
> >         return true;
> >  }
> >
> > @@ -278,6 +304,8 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> >         struct codetag *ct;
> >         struct allocinfo_get_at params = {0};
> >         __u64 skip_count;
> > +       struct alloc_tag_counters counters;
> > +       bool fetched_counters;
> >
> >         if (copy_from_user(&params, arg, sizeof(params)))
> >                 return -EFAULT;
> > @@ -285,6 +313,11 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> >         if (params.filter.mask & ~ALLOCINFO_FILTER_MASKS)
> >                 return -EINVAL;
> >
> > +       if ((params.filter.mask & ALLOCINFO_FILTER_MASK_MIN_SIZE) &&
> > +           (params.filter.mask & ALLOCINFO_FILTER_MASK_MAX_SIZE) &&
> > +           params.filter.min_size > params.filter.max_size)
> > +               return -EINVAL;
> > +
> >         priv = m->private;
> >
> >         mutex_lock(&priv->ioctl_lock);
> > @@ -308,7 +341,8 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> >         ct = codetag_next_ct(&priv->ioctl_iter);
> >
> >         while (ct) {
> > -               if (matches_filter(ct, &priv->filter)) {
> > +               fetched_counters = false;
> > +               if (matches_filter(ct, &priv->filter, &counters, &fetched_counters)) {
>
> Do we really need this "fetched_counters" parameter? Here are the
> possible cases:
> 1. If the filter does not include ALLOCINFO_FILTER_MASK_MIN_SIZE |
> ALLOCINFO_FILTER_MASK_MAX_SIZE then counters would not be fetched.
> 2. If the filter includes ALLOCINFO_FILTER_MASK_MIN_SIZE |
> ALLOCINFO_FILTER_MASK_MAX_SIZE and
> 2.1. matches_filter() returns true then we know counters were fetched
> because they had to be validated.
> 2.2. matches_filter() returns false then we don't care if the counters
> were fetched. We do not report that tag anyway.
>
> So, instead of passing fetched_counters to matches_filter() we could do this:
>
> bool filter_by_size = (params.filter.mask &
> (ALLOCINFO_FILTER_MASK_MIN_SIZE | ALLOCINFO_FILTER_MASK_MAX_SIZE)) !=
> 0;
> while (ct) {
>            if (matches_filter(ct, &priv->filter, &counters)) {
> ...
> }
> if (ct) {
>            allocinfo_to_params(ct, &params.data, filter_by_size ?
> &counters : NULL);
> ...
> }
>
> Wouldn't that work?
>

While we can deduce whether counters were fetched outside the
matches_filter function, I think the current implementation is more
intuitive from a readability perspective. I believe it  should be kept
as is for that reason. If we extract the logic, we'll first have to
replicate the boolean logic at two places. Second, we'd need to add a
comment explaining the boolean calculation, and the reader might have
a higher cognitive load trying to determine which function populates
the counters. The current implementation makes it easy for the reader
to deduce the original intention. Let me know what you think.

> >                         if (skip_count == 0)
> >                                 break;
> >                         skip_count--;
> > @@ -317,7 +351,7 @@ static int allocinfo_ioctl_get_at(struct seq_file *m, void __user *arg)
> >         }
> >
> >         if (ct) {
> > -               allocinfo_to_params(ct, &params.data);
> > +               allocinfo_to_params(ct, &params.data, fetched_counters ? &counters : NULL);
> >                 priv->positioned = true;
> >         }
> >
> > @@ -343,6 +377,8 @@ static int allocinfo_ioctl_get_next(struct seq_file *m, void __user *arg)
> >         struct codetag *ct;
> >         struct allocinfo_tag_data params;
> >         int ret = 0;
> > +       struct alloc_tag_counters counters;
> > +       bool fetched_counters;
> >
> >         memset(&params, 0, sizeof(params));
> >         priv = m->private;
> > @@ -356,10 +392,15 @@ static int allocinfo_ioctl_get_next(struct seq_file *m, void __user *arg)
> >         }
> >
> >         ct = codetag_next_ct(&priv->ioctl_iter);
> > -       while (ct && !matches_filter(ct, &priv->filter))
> > +       while (ct) {
> > +               fetched_counters = false;
> > +               if (matches_filter(ct, &priv->filter, &counters, &fetched_counters))
> > +                       break;
> >                 ct = codetag_next_ct(&priv->ioctl_iter);
> > +       }
> > +
> >         if (ct)
> > -               allocinfo_to_params(ct, &params);
> > +               allocinfo_to_params(ct, &params, fetched_counters ? &counters : NULL);
> >
> >         if (!ct) {
> >                 priv->positioned = false;
> > --
> > 2.54.0.1136.gdb2ca164c4-goog
> >

^ permalink raw reply

* Re: [PATCH v19 net-next 00/11] nbl driver for Nebulamatrix NICs
From: Jakub Kicinski @ 2026-06-17 20:46 UTC (permalink / raw)
  To: illusion.wang
  Cc: dimon.zhao, alvin.wang, sam.chen, netdev, andrew+netdev, corbet,
	horms, linux-doc, pabeni, vadim.fedorenko, lukas.bulwahn,
	edumazet, enelsonmoore, skhan, hkallweit1, open list
In-Reply-To: <20260617044702.2439-1-illusion.wang@nebula-matrix.com>

On Wed, 17 Jun 2026 12:46:45 +0800 illusion.wang wrote:
> This patch series represents the first phase. We plan to integrate it in
> two phases: the first phase covers mailbox and chip configuration,
> while the second phase involves net dev configuration.
> Together, they will provide basic PF-based Ethernet port transmission and
> reception capabilities.

## Form letter - net-next-closed

We have already submitted our pull request with net-next material for v7.2,
and therefore net-next is closed for new drivers, features, code refactoring
and optimizations. We are currently accepting bug fixes only.

Please repost when net-next reopens after June 29th.

RFC patches sent for review only are obviously welcome at any time.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
-- 
pw-bot: defer
pv-bot: closed

^ permalink raw reply

* Re: [PATCH v2 04/11] hugetlbfs,filemap: replace hugetlbfs_read_iter() with generic_file_read_iter()
From: Matthew Wilcox @ 2026-06-17 20:07 UTC (permalink / raw)
  To: Jane Chu
  Cc: akpm, jack, viro, brauner, muchun.song, osalvador, david, hughd,
	baolin.wang, linmiaohe, nao.horiguchi, lorenzo, rppt, peterx,
	corbet, linux-doc, linux-mm, linux-kernel, linux-fsdevel
In-Reply-To: <20260617172534.1740152-5-jane.chu@oracle.com>

On Wed, Jun 17, 2026 at 11:25:25AM -0600, Jane Chu wrote:
> +++ b/mm/filemap.c
> @@ -2672,20 +2672,30 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
>  {
>  	struct file *filp = iocb->ki_filp;
>  	struct address_space *mapping = filp->f_mapping;
> +	bool is_hugetlbfs = is_file_hugepages(filp);
>  	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
>  	pgoff_t last_index;
>  	struct folio *folio;
>  	unsigned int flags;
> +	size_t min_folio_bytes;
>  	int err = 0;
>  
>  	/* "last_index" is the index of the folio beyond the end of the read */
> -	last_index = round_up(iocb->ki_pos + count,
> -			mapping_min_folio_nrbytes(mapping)) >> PAGE_SHIFT;
> +	if (is_hugetlbfs)
> +		min_folio_bytes = huge_page_size(hstate_file(filp));
> +	else
> +		min_folio_bytes = mapping_min_folio_nrbytes(mapping);
> +	last_index = round_up(iocb->ki_pos + count, min_folio_bytes) >> PAGE_SHIFT;

I don't love this.  Is there a way we can get mapping_min_folio_nrbytes()
to give us the right number for hugetlbfs?  I don't see why it wouldn't
be possible ...

>  	filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
> +
> +	if (is_hugetlbfs)
> +		goto done;

We don't actually need this, do we?  For hugetlbfs, I don't think we
can get 0 folios in the batch, and then we won't find a folio with
readahead set, and they're always uptodate ... so we're just skipping a
few tests with this?


^ permalink raw reply

* Re: [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Babu Moger @ 2026-06-17 19:55 UTC (permalink / raw)
  To: Reinette Chatre, Moger, Babu, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman
In-Reply-To: <413ac3d1-0377-4762-a88f-3d7bfc3a9dff@intel.com>

Hi Reinette,

On 6/17/26 12:33, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/17/26 8:56 AM, Babu Moger wrote:
>>
>> The display will look like this when the system boots up.
>>
>> # cat info/kernel_mode
>>    inherit_ctrl_and_mon:
>>    global_assign_ctrl_assign_mon_per_cpu:group=uninitialized
>>    global_assign_ctrl_assign_mon_per_cpu:group=uninitialized
>>
>> There will not be any group associated with "inherit_ctrl_and_mon".
>> It is only used to switch from other two modes.
> 
> Just two nitpicks (adding the "[]" to indicate effective mode and fixing the
> copy&paste duplicate mode names) to confirm that I think you actually intended to
> write:

Yes. Thanks for the correction.

> 
> # cat info/kernel_mode
>    [inherit_ctrl_and_mon:]
>    global_assign_ctrl_inherit_mon_per_cpu:group=uninitialized
>    global_assign_ctrl_assign_mon_per_cpu:group=uninitialized
> 
> I would like to propose that the user documentation contains something like
> "the kernel mode is followed by a semi-colon separated list of properties"
> This implementation does not require more than one property associated with a mode
> so this does not need any code changes but adding that flexibility to the user
> interface should help if some future kernel mode needs more than one property.
> What do you think?

Sounds good.

Thanks
Babu

^ permalink raw reply

* Re: [PATCH v3 03/12] fs/resctrl: Add kernel mode (kmode) data structures and arch hook
From: Babu Moger @ 2026-06-17 19:36 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman
In-Reply-To: <d994c06c-db1a-4745-aaab-3abb05466a67@intel.com>

Hi Reinette,

On 6/16/26 18:30, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/30/26 4:24 PM, Babu Moger wrote:
>> Privilege-Level Zero Association (PLZA) allows the user to specify a CLOSID
> 
> Subject prefix makes it clear this is a resctrl fs patch so please take care to
> not mix architecture specific terms with resctrl fs generalized support.
> 
> Something that may help here is to consider all resctrl fs changes to be
> relevant from MPAM perspective. Please do so with all resctrl fs changes in
> this series.

Ok. Agreed.

> 
>> and/or RMID associated with execution in Privilege-Level Zero. Introduce a
>> generic enumeration so that architecture and generic code can agree on the
>> available policies.
>>
>> Introduce enum resctrl_kernel_modes with the following values:
> 
> Please make the enum name singular, "resctrl_kernel_modes" -> "resctrl_kernel_mode"
> Doing so will make its use in code easier to parse.

Ok. Sure.

> 
>>
>>    - INHERIT_CTRL_AND_MON: kernel and user tasks share the same CLOSID and
>>      RMID.  This is the default and matches today's resctrl behaviour.
> 
> CLOSID and RMID are x86 terms where the meaning is not 1:1 with other architectures.
> Since this is a new resctrl fs interface it is expected to be usable by all
> architectures. Making this architecture specific is not appropriate.
> 
> These are the modes that are exposed to user space and user space has no insight
> into CLOSID and RMID (ignoring scenario of debugging). I see no reason for
> resctrl do dictate CLOSID/RMID assignment as part of these modes but instead
> what the modes mean should be explained. If it is helpful then any x86 specific
> details can be added by highlighting it is x86 specific. For example,
> 
>       "Kernel work inherits the allocation and monitoring from the user space task.
>        On x86 this means that kernel work shares the same CLOSID and RMID as
>        the user space task."

Sure.

> 
> 
> 
>>
>>    - GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU: a CLOSID is assigned for kernel
>>      work while the RMID used for monitoring is inherited from the running
>>      user task.  The default scope is all online CPUs and may be narrowed to
>>      a subset via the resctrl group interface.  A CTRL_MON group can be
>>      bound to this mode.
> 
> Is binding a CTRL_MON group optional? Consider, for example:
> 
> 	"A CTRL_MON group is bound to this mode."
> 

Ok. Sure

>>
>>    - GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU: both CLOSID and RMID are
>>      assigned to kernel work.  The default scope is all online CPUs and may
>>      be narrowed per CPU via the resctrl group interface.  A CTRL_MON group
>>      can be bound to this mode.
> 
> It should be possible to bind a MON group also, no?

Yes. We can bind either CTRL_MON or MON group to this mode.

Here is the discussion about it.
https://lore.kernel.org/lkml/1d7c79bf-1e40-4db7-8f66-45f234b6d87e@amd.com/

GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU can be either CTRL_MON or MON group.

GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU can only be CTRL_MON.


"A CTRL_MON or MON group is bound to this mode."


> 
>>
>>    - RESCTRL_KMODE_LAST: highest enumerator naming a policy mode.
>>
>>    - RESCTRL_NUM_KERNEL_MODES: number of policy modes; use this to size
>>      static tables indexed by mode.
> 
> The last two can be dropped, this is clear from the patch.

Ok.

> 
>>
>> Also add struct resctrl_kmode_cfg (the snapshot architecture code returns)
>> in include/linux/resctrl_types.h, and declare
>> resctrl_arch_get_kmode_support() in include/linux/resctrl.h so architecture
>> code can advertise the supported modes.
> 
> Above mostly just describes what is clear from the patch. Instead this can summarize
> what the addition does: "Provide callback with which architecture can set the
> kernel modes supported by it". (not exactly what this patch does though, but more below ...)
> 
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v3: Removed resctrl_kmode definition.
>>      Changed the kernel mode definitions to enum resctrl_kernel_modes.
>>      Used BIT() to set/test the features.
>>      Added details to changelog.
>>
>> v2: New patch to handle PLZA interfaces with /sys/fs/resctrl/info/ directory.
>>      https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@intel.com/
>> ---
>>   include/linux/resctrl.h       | 13 ++++++++++
>>   include/linux/resctrl_types.h | 46 +++++++++++++++++++++++++++++++++++
>>   2 files changed, 59 insertions(+)
>>
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 006e57fd7ca5..ce28418df00f 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -699,6 +699,19 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable);
>>    */
>>   bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r);
>>   
>> +/**
>> + * resctrl_arch_get_kmode_support() - Advertise kernel-mode capabilities
> 
> "Advertise" implies a "broadcast" while the function name is "get" that implies
> retrieval.
> 
> Why does resctrl query the support from the architecture? The typical resctrl initialization
> involves the architecture setting certain capabilities. This simplifies enabling since
> it does not require the addition of this feature to be accompanied with an implementation of
> this call by every architecture.
> 
> Instead, resctrl can just initialize the defaults and an architecture can
> make any adjustments using the optional callback. So, instead of
> resctrl_arch_get_kmode_support(), why not resctrl_set_kmode_support() that is
> implemented in resctrl fs and called by architecture?

Yes. We can do that. We can add resctrl_set_kmode_support() in FS and 
call from architecture.

> 
> When considering the x86 implementation of this it seems as though this implementation
> assumes that all architectures will support inherit_ctrl_and_mon but this is not
> enforced anywhere. Having any assumptions enforced/verified will help to make this
> more robust. The fs/arch separation depending on so many architectures
> "doing the right thing" seems risky.

ok.

> 
> 
>> + * @kcfg:	Architecture ORs BIT() flags into @kcfg->kmode for each supported
>> + *		&enum resctrl_kernel_modes value (see &struct resctrl_kmode_cfg).
>> + *
>> + * Used for optional features (for example PLZA on x86) that can assign CLOSID
>> + * and/or RMID to kernel work separately from user tasks.  Generic code compares
>> + * @kcfg->kmode with the effective @kcfg->kmode_cur; when a global-assign mode is
>> + * active, @kcfg->k_rdtgrp identifies the active &struct rdtgroup. The default mode
> 
> Does the architecture need to know these implementation details?

Not required.

> 
>> + * is INHERIT_CTRL_AND_MON and group is default group.
>> + */
>> +void resctrl_arch_get_kmode_support(struct resctrl_kmode_cfg *kcfg);
> 
> Why does architecture need to know the layout of struct resctrl_kmode_cfg? It only needs

Arch does not need to know the resctrl_kmode_cfg.

> to share the modes it supports and need not be concerned with any of the internals - from
> what I can tell the hook to program the kernel mode does not use this structure either and
> this is the only "outside of resctrl fs" usage and it does not seem necessary.

We can just pass the modes to resctrl_set_kmode_support() from arch to 
set it.

> 
>> +
>>   extern unsigned int resctrl_rmid_realloc_threshold;
>>   extern unsigned int resctrl_rmid_realloc_limit;
>>   
>> diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
>> index a5f56faa18d2..3aba07764b99 100644
>> --- a/include/linux/resctrl_types.h
>> +++ b/include/linux/resctrl_types.h
> 
> Please keep in mind that resctrl_types.h is reserved for those types that an architecture
> needs to use in its asm/resctrl.h ... it does not look like any of the types added here qualify.

I will move it to include/linux/resctrl.h ad both ARCH and FS need to 
know about these modes.

> 
>> @@ -68,4 +68,50 @@ enum resctrl_event_id {
>>   #define QOS_NUM_L3_MBM_EVENTS	(QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
>>   #define MBM_STATE_IDX(evt)	((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
>>   
>> +/**
>> + * enum resctrl_kernel_modes - Kernel versus user CLOSID/RMID policy
> 
> What does "versus user" mean? Can this be dropped?

ok.

> 
>> + *
>> + * Enumeration values are contiguous indices from 0 through
>> + * @RESCTRL_KMODE_LAST inclusive.
> 
> Above sentence is not necessary.

ok.

> 
>> Global-assign modes treat all online CPUs as
>> + * in scope by default; a subset of CPUs may be selected by using resctrl
>> + * group's interface.
>> + *
>> + * @INHERIT_CTRL_AND_MON:
>> + *	User and kernel tasks use the same CLOSID and RMID.
> 
> Similar comment as earlier. Since this is generic resctrl fs interface it needs to
> be applicable to all architectures. For example (same suggestion as earlier),
>    "Kernel work inherits the allocation and monitoring of the user space task.

ok.

> 
>> + * @GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
>> + *	A CLOSID may be assigned for kernel work while RMID selection for
> 
> "may be assigned" - this is not optional, right? How about "A control group is assigned ..."

ok.

> 
> 
>> + *	monitoring follows the same inheritance rules as for user contexts.
>> + *	Default scope is all online CPUs: subset of CPUs may be selected by
>> + *	using resctrl group's interface.
>> + * @GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU:
>> + *	A single resource group (CLOSID and RMID together) may be assigned to
> 
> "may be" -> "is" ?

ok.


>> + *	kernel work. Default scope is all online CPUs: subset of CPUs may be
>> + *	selected by using resctrl group's interface.
>> + * @RESCTRL_KMODE_LAST:
> 
> Documenting @RESCTRL_KMODE_LAST is not necessary.
> 
>> + *	Highest enumerator that names a policy mode. Use RESCTRL_NUM_KERNEL_MODES
>> + *	to size static tables indexed by mode.
> 
> No need to document this.

ok.

> 
>> + */
>> +enum resctrl_kernel_modes {
>> +	INHERIT_CTRL_AND_MON,
>> +	GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU,
>> +	GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU,
>> +	RESCTRL_KMODE_LAST = GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU,
>> +};
>> +
>> +#define RESCTRL_NUM_KERNEL_MODES (RESCTRL_KMODE_LAST + 1)
>> +
>> +/**
>> + * struct resctrl_kmode_cfg - Kernel-mode policy snapshot from architecture
> 
> Only @kmode is initialized from the architecture. The rest is managed by resctrl fs.
> I do not see why architecture needs to know the structure details.

Correct. Will move this to FS.

> 
>> + * @kmode:	Hardware- or policy-supported modes: each enumerator from
>> + *		&enum resctrl_kernel_modes is represented by BIT(mode index).
>> + * @kmode_cur:	Effective mode(s) in the same BIT(index) form as @kmode.
> 
> "mode(s)" ... this is plural implying more than one mode can be active at a time?

no.

> Should this not be just one mode and can thus have type "enum resctrl_kernel_mode" to make
> this obvious?

Yes.

> 
>> + * @k_rdtgrp:	Resource group backing global-assign modes when applicable;
>> + *		initialized to the default group at boot.
> 
> Why is this initialized to default group at boot? I believe inherit_ctrl_and_mon is
> the default mode and it does not have a group so should this not be NULL by default?

Yes. It should be NULL at boot. Will change it.

> 
>> + */
>> +struct resctrl_kmode_cfg {
>> +	u32 kmode;
>> +	u32 kmode_cur;
>> +	struct rdtgroup *k_rdtgrp;
> 
> Please align struct members in tabular fashion.
> 
> Not specific to this patch: After so many contributions to resctrl I am very surprised how
> this series does not respect Documentation/process/maintainer-tip.rst in many ways. For example,
> later patches at some point just stops writing changelogs in imperative tone and just
> documents what the code does, patches document locking requirements instead of using code
> like lockdep_assert_held(), variables are not declared in reverse fir, changelogs refer to
> other patches in series. Following Documentation/process/maintainer-tip.rst should be
> very familiar by now.

My bad. Yes. Will focus on process in next revision.

Thanks
Babu

^ permalink raw reply

* Re: [PATCH v16 00/10] arm64/riscv: Add support for crashkernel CMA reservation
From: Mike Rapoport @ 2026-06-17 18:48 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, pasha.tatashin,
	pratyush, ruirui.yang, rdunlap, peterz, feng.tang, dapeng1.mi,
	kees, elver, kuba, lirongqing, ebiggers, paulmck, leitao, coxu,
	Liam.Howlett, ryan.roberts, osandov, jbohac, cfsworks,
	tangyouling, sourabhjain, ritesh.list, adityag, liaoyuanhong,
	seanjc, fuqiang.wang, ardb, chenjiahao16, guoren, x86, linux-doc,
	linux-kernel, linux-arm-kernel, loongarch, linuxppc-dev,
	linux-riscv, devicetree, kexec
In-Reply-To: <20260608073459.3119290-1-ruanjinjie@huawei.com>

Hi Jinjie,

On Mon, Jun 08, 2026 at 03:34:49PM +0800, Jinjie Ruan wrote:
> The crash memory allocation, and the exclude of crashk_res, crashk_low_res
> and crashk_cma memory are almost identical across different architectures,
> This patch set handle them in crash core in a general way, which eliminate
> a lot of duplication code.
> 
> And add support for crashkernel CMA reservation for arm64 and riscv.
> 
> This patch set is rebased on v7.1-rc1.

Please rebase this set on v7.2-rc1 once that's out.

I'm going to queue it in the liveupdate tree then to expose to the wider
testing.

Meanwhile it would be great to chase riscv and x86 maintainers for acks :)

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v2 00/11] hugetlb: Use PAGE granularity index in exported i/f and adopt the common read_iter
From: Mike Rapoport @ 2026-06-17 18:28 UTC (permalink / raw)
  To: Jane Chu
  Cc: akpm, willy, jack, viro, brauner, muchun.song, osalvador, david,
	hughd, baolin.wang, linmiaohe, nao.horiguchi, lorenzo, peterx,
	corbet, linux-doc, linux-mm, linux-kernel, linux-fsdevel
In-Reply-To: <20260617172534.1740152-1-jane.chu@oracle.com>

Hi Jane,

On Wed, Jun 17, 2026 at 11:25:21AM -0600, Jane Chu wrote:
> changes in v2:
>  - new patches 1-4: add hwpoison handling to filemap_read(),
>    thus replace hugetlbfs_read_iter() with generic_file_read_iter(),
>    suggested by Matthew [2];
>  - new patch 5: convert hugetlb fault handler's vmf->pgoff to PAGE_SIZE
>    granularity like the rest of mm fault handling convention, suggested
>    by Matthew [2];
>  - patch 6: fixed a bug in v1 pointed out by Usama Arif, also by syzbot;
>  - patch 8: did not pick the Acked-by from Oscar (for 5/6 in v1) due to
>    updates to the patch;
>  - patch 11: add VM_WARN_ON in hugetlb_unreserve_pages(), per Oscar;

It seems that cow, hugetlb, GUP and HMM selftests trigger these WARN_ONs:

https://github.com/linux-mm/linux-mm/actions/runs/27707843062/job/81960927740
   
> v1:
> This series stems from a discussion with David. [1]
> The series makes a small cleanup to a few hugetlb interfaces used
> outside the subsystem by standardizing them on base-page indices.
> Hopefully this makes the interface semantics a bit more coherent with
> the rest of mm, while the internal hugetlb code continue to use hugepage
> indices where that remains the more natural fit.
> 
> [1] https://lore.kernel.org/linux-mm/9ec9edd1-0f4c-4da2-ae78-0e7b251a9e25@kernel.org/
> [2] https://lore.kernel.org/linux-mm/aeZwAz6PcdlqSnJ2@casper.infradead.org/

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v4 3/5] rpmsg: virtio_rpmsg_bus: get buffer size from config space
From: Shah, Tanmay @ 2026-06-17 17:41 UTC (permalink / raw)
  To: Arnaud POULIQUEN, Tanmay Shah, andersson, mathieu.poirier, corbet,
	skhan
  Cc: linux-remoteproc, linux-doc, linux-kernel
In-Reply-To: <5fba8065-c0e9-4514-863b-8c7c91fb79de@foss.st.com>



On 6/17/2026 4:15 AM, Arnaud POULIQUEN wrote:
> Hi Tanmay,
> 
> On 6/15/26 22:20, Tanmay Shah wrote:
>> 512 bytes isn't always suitable for all case, let firmware
>> maker decide the best value from resource table.
>> enable by VIRTIO_RPMSG_F_BUFSZ feature bit.
>>
>> Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
>> ---
>>
>> Changes in v4: squash to virtio rpmsg config patch
>>    - Introduce new patch to modify rpmsg.rst documentation
>>    - check version is always 1.
>>    - check size field is same as size of struct virtio_rpmsg_config
>>    - introduce alignment field
>>    - check alignment field is power of 2
>>    - check tx and rx buf size is aligned with alignment passed in the
>>      structure
>>
>> Changes in v3:
>>    - change version field from u16 to u8
>>    - introduce size field in the rpmsg_virtio_config structure
>>    - check version field is set to any non-zero value.
>>    - check size field is not 0.
>>    - Remove field for private config, as not needed for now.
>>    - add documentation of rpmsg_virtio_config structure
>>
>>   drivers/rpmsg/virtio_rpmsg_bus.c   | 129 ++++++++++++++++++++++++-----
>>   include/linux/rpmsg/virtio_rpmsg.h |  50 +++++++++++
>>   2 files changed, 160 insertions(+), 19 deletions(-)
>>   create mode 100644 include/linux/rpmsg/virtio_rpmsg.h
>>
>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/
>> virtio_rpmsg_bus.c
>> index 99df1ae07055..a59925f870a4 100644
>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>> @@ -15,11 +15,13 @@
>>   #include <linux/idr.h>
>>   #include <linux/jiffies.h>
>>   #include <linux/kernel.h>
>> +#include <linux/log2.h>
>>   #include <linux/module.h>
>>   #include <linux/mutex.h>
>>   #include <linux/rpmsg.h>
>>   #include <linux/rpmsg/byteorder.h>
>>   #include <linux/rpmsg/ns.h>
>> +#include <linux/rpmsg/virtio_rpmsg.h>
>>   #include <linux/scatterlist.h>
>>   #include <linux/slab.h>
>>   #include <linux/sched.h>
>> @@ -39,7 +41,8 @@
>>    * @tx_bufs:    kernel address of tx buffers
>>    * @num_rx_buf: total number of rx buffers
>>    * @num_tx_buf: total number of tx buffers
>> - * @buf_size:   size of one rx or tx buffer
>> + * @rx_buf_size: size of one rx buffer
>> + * @tx_buf_size: size of one tx buffer
>>    * @last_tx_buf: index of last tx buffer used
>>    * @bufs_dma:    dma base addr of the buffers
>>    * @tx_lock:    protects svq and tx_bufs, to allow concurrent senders.
>> @@ -59,7 +62,8 @@ struct virtproc_info {
>>       void *rx_bufs, *tx_bufs;
>>       unsigned int num_rx_buf;
>>       unsigned int num_tx_buf;
>> -    unsigned int buf_size;
>> +    unsigned int rx_buf_size;
>> +    unsigned int tx_buf_size;
>>       int last_tx_buf;
>>       dma_addr_t bufs_dma;
>>       struct mutex tx_lock;
>> @@ -68,9 +72,6 @@ struct virtproc_info {
>>       wait_queue_head_t sendq;
>>   };
>>   -/* The feature bitmap for virtio rpmsg */
>> -#define VIRTIO_RPMSG_F_NS    0 /* RP supports name service
>> notifications */
>> -
>>   /**
>>    * struct rpmsg_hdr - common header for all rpmsg messages
>>    * @src: source address
>> @@ -128,7 +129,7 @@ struct virtio_rpmsg_channel {
>>    * processor.
>>    */
>>   #define MAX_RPMSG_NUM_BUFS    (256)
>> -#define MAX_RPMSG_BUF_SIZE    (512)
>> +#define DEFAULT_RPMSG_BUF_SIZE    (512)
>>     /*
>>    * Local addresses are dynamically allocated on-demand.
>> @@ -444,7 +445,7 @@ static void *get_a_tx_buf(struct virtproc_info *vrp)
>>         /* either pick the next unused tx buffer */
>>       if (vrp->last_tx_buf < vrp->num_tx_buf)
>> -        ret = vrp->tx_bufs + vrp->buf_size * vrp->last_tx_buf++;
>> +        ret = vrp->tx_bufs + vrp->tx_buf_size * vrp->last_tx_buf++;
>>       /* or recycle a used one */
>>       else
>>           ret = virtqueue_get_buf(vrp->svq, &len);
>> @@ -514,7 +515,7 @@ static int rpmsg_send_offchannel_raw(struct
>> rpmsg_device *rpdev,
>>        * messaging), or to improve the buffer allocator, to support
>>        * variable-length buffer sizes.
>>        */
>> -    if (len > vrp->buf_size - sizeof(struct rpmsg_hdr)) {
>> +    if (len > vrp->tx_buf_size - sizeof(struct rpmsg_hdr)) {
>>           dev_err(dev, "message is too big (%d)\n", len);
>>           return -EMSGSIZE;
>>       }
>> @@ -647,7 +648,7 @@ static ssize_t virtio_rpmsg_get_mtu(struct
>> rpmsg_endpoint *ept)
>>       struct rpmsg_device *rpdev = ept->rpdev;
>>       struct virtio_rpmsg_channel *vch = to_virtio_rpmsg_channel(rpdev);
>>   -    return vch->vrp->buf_size - sizeof(struct rpmsg_hdr);
>> +    return vch->vrp->tx_buf_size - sizeof(struct rpmsg_hdr);
>>   }
>>     static int rpmsg_recv_single(struct virtproc_info *vrp, struct
>> device *dev,
>> @@ -673,7 +674,7 @@ static int rpmsg_recv_single(struct virtproc_info
>> *vrp, struct device *dev,
>>        * We currently use fixed-sized buffers, so trivially sanitize
>>        * the reported payload length.
>>        */
>> -    if (len > vrp->buf_size ||
>> +    if (len > vrp->rx_buf_size ||
>>           msg_len > (len - sizeof(struct rpmsg_hdr))) {
>>           dev_warn(dev, "inbound msg too big: (%d, %d)\n", len, msg_len);
>>           return -EINVAL;
>> @@ -706,7 +707,7 @@ static int rpmsg_recv_single(struct virtproc_info
>> *vrp, struct device *dev,
>>           dev_warn_ratelimited(dev, "msg received with no recipient\n");
>>         /* publish the real size of the buffer */
>> -    rpmsg_sg_init(&sg, msg, vrp->buf_size);
>> +    rpmsg_sg_init(&sg, msg, vrp->rx_buf_size);
>>         /* add the buffer back to the remote processor's virtqueue */
>>       err = virtqueue_add_inbuf(vrp->rvq, &sg, 1, msg, GFP_KERNEL);
>> @@ -820,10 +821,13 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>       struct virtproc_info *vrp;
>>       struct virtio_rpmsg_channel *vch = NULL;
>>       struct rpmsg_device *rpdev_ns, *rpdev_ctrl;
>> +    u16 rpmsg_buf_align = 0;
>>       void *bufs_va;
>>       int err = 0, i;
>>       size_t total_buf_space;
>>       bool notify;
>> +    u8 version;
>> +    u16 size;
>>         vrp = kzalloc_obj(*vrp);
>>       if (!vrp)
>> @@ -855,9 +859,90 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>       else
>>           vrp->num_tx_buf = MAX_RPMSG_NUM_BUFS;
>>   -    vrp->buf_size = MAX_RPMSG_BUF_SIZE;
>> +    /*
>> +     * If VIRTIO_RPMSG_F_BUFSZ feature is supported, then configure buf
>> +     * size from virtio device config space from the resource table.
>> +     * If the feature is not supported, then assign default buf size.
>> +     */
>> +    if (virtio_has_feature(vdev, VIRTIO_RPMSG_F_BUFSZ)) {
>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>> +                 version, &version);
>> +
>> +        /* for now we support only v1 */
>> +        if (version != RPMSG_VDEV_CONFIG_V1) {
>> +            dev_err(&vdev->dev,
>> +                "unsupported vdev config version %u\n", version);
>> +            err = -EINVAL;
>> +            goto vqs_del;
>> +        }
>> +
>> +        /* size of the config space must match */
>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>> +                 size, &size);
>> +        if (size != sizeof(struct virtio_rpmsg_config)) {
>> +            dev_err(&vdev->dev, "invalid size of vdev config %u\n",
>> +                size);
>> +            err = -EINVAL;
>> +            goto vqs_del;
>> +        }
>>   -    total_buf_space = (vrp->num_rx_buf + vrp->num_tx_buf) * vrp-
>> >buf_size;
>> +        /*
>> +         * Optional alignment applied to each buffer size and to the TX
>> +         * buffer base address (e.g. to align buffers on a cache line).
>> +         * It must be a power of two; zero means no extra alignment.
>> +         */
>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>> +                 rpmsg_buf_align, &rpmsg_buf_align);
>> +        if (rpmsg_buf_align && !is_power_of_2(rpmsg_buf_align)) {
>> +            dev_err(&vdev->dev,
>> +                "bad vdev config: rpmsg_buf_align %u is not a power
>> of two\n",
>> +                rpmsg_buf_align);
>> +            err = -EINVAL;
>> +            goto vqs_del;
>> +        }
>> +
>> +        /* note: tx and rx are defined from remote view */
>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>> +                 txbuf_size, &vrp->rx_buf_size);
>> +        virtio_cread(vdev, struct virtio_rpmsg_config,
>> +                 rxbuf_size, &vrp->tx_buf_size);
>> +
>> +        /* The buffers must hold at least the rpmsg header */
>> +        if (vrp->rx_buf_size < sizeof(struct rpmsg_hdr) ||
>> +            vrp->tx_buf_size < sizeof(struct rpmsg_hdr)) {
>> +            dev_err(&vdev->dev,
>> +                "bad vdev config: rx buf sz = %u, tx buf sz = %u\n",
>> +                vrp->rx_buf_size, vrp->tx_buf_size);
>> +            err = -EINVAL;
>> +            goto vqs_del;
>> +        }
>> +
>> +        /*
>> +         * The buffer size must be aligned to the provided alignment for
>> +         * so that the start address of tx bufs can be aligned.
>> +         */
> 
> 'tx' to remove as  it also concerns Rx buffers
> 

Ack.

> 
> What about removing this check to manage alignment during buffer
> allocation?
> 
> For example, if the alignment is on a 64-bit address and the tx_buffer
> and rx_buffer sizes are 40 bytes, 48 bytes can be allocated in memory
> for each buffer, and the virtio descriptor can be filled with aligned
> addresses.
> 
> In other words, the rpmsg_buf_align field contains the alignment
> constraint from the remote processor. If the Linux kernel wants to
> impose another alignment constraint, it must test or update
> rpmsg_buf_align, but it must not impose alignment on the buffer size.
> 
> 

This part I don't understand. `rpmsg_buf_align` is alignment for only
single buffer size. The linux kernel is checking that single rx buf size
and tx buf size is aligned with `rpmsg_buf_align` as firmware has claimed.

For reference the openamp-system-reference PR:
https://github.com/OpenAMP/openamp-system-reference/pull/106/changes

	.vdev_config = {
		.version = 1,
		.reserved = 0,
		.size = (uint16_t)(sizeof(struct rpmsg_virtio_config) - sizeof(bool)),
		.alignment = RPMSG_BUF_ALIGN,
		.reserved1 = 0,
		/* Tx for host */
		.h2r_buf_size = metal_align_up(4096, RPMSG_BUF_ALIGN),
		/* Rx for host */
		.r2h_buf_size = metal_align_up(4096, RPMSG_BUF_ALIGN),
	},

IIUC, The linux kernel is not really supposed to modify
`rpmsg_buf_align`. It only uses it to check that firmware has assigned
correct size of single rx and tx buffer.


When the linux kernel uses dma_alloc_coherent() API it aligns total
buffer size with page size. That is different than single tx buf size
and single rx buf size. The total buf size alignment to page size is
irrelevant to `rpmsg_buf_align` field.

Please let me know if I am missing something or didn't understand your
comment. I prefer that `rpmsg_buf_align` should be only modified by the
firmware and not the linux kernel.



>> +        if (rpmsg_buf_align &&
>> +            (!IS_ALIGNED(vrp->rx_buf_size, rpmsg_buf_align) ||
>> +             !IS_ALIGNED(vrp->tx_buf_size, rpmsg_buf_align))) {
>> +            dev_err(&vdev->dev,
>> +                "bad vdev config: buf sizes (rx %u, tx %u) not
>> aligned to %u\n",
>> +                vrp->rx_buf_size, vrp->tx_buf_size,
>> +                rpmsg_buf_align);
>> +            err = -EINVAL;
>> +            goto vqs_del;
>> +        }
>> +
>> +        dev_dbg(&vdev->dev,
>> +            "vdev config: ver=%u, align=0x%x, rx sz = 0x%x, tx sz =
>> 0x%x\n",
>> +            version, rpmsg_buf_align, vrp->rx_buf_size,
>> +            vrp->tx_buf_size);
>> +    } else {
>> +        vrp->rx_buf_size = DEFAULT_RPMSG_BUF_SIZE;
>> +        vrp->tx_buf_size = DEFAULT_RPMSG_BUF_SIZE;
>> +    }
>> +
>> +    total_buf_space = (vrp->num_rx_buf * vrp->rx_buf_size) +
>> +              (vrp->num_tx_buf * vrp->tx_buf_size);
>>         /* allocate coherent memory for the buffers */
>>       bufs_va = dma_alloc_coherent(vdev->dev.parent,
>> @@ -874,15 +959,20 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>       /* first part of the buffers is dedicated for RX */
>>       vrp->rx_bufs = bufs_va;
>>   -    /* and second part is dedicated for TX */
>> -    vrp->tx_bufs = bufs_va + vrp->num_rx_buf * vrp->buf_size;
>> +    /*
>> +     * Here buf_va is aligned to a page. Also rx buf size is aligned
>> with
>> +     * cache line alignment provided by the firmware, so tx buf's start
>> +     * address is guranteed to be aligned with the alignment provided by
>> +     * the firmware.
>> +     */
>> +    vrp->tx_bufs = bufs_va + (vrp->num_rx_buf * vrp->rx_buf_size);
>>         /* set up the receive buffers */
>>       for (i = 0; i < vrp->num_rx_buf; i++) {
>>           struct scatterlist sg;
>> -        void *cpu_addr = vrp->rx_bufs + i * vrp->buf_size;
>> +        void *cpu_addr = vrp->rx_bufs + i * vrp->rx_buf_size;
>>   -        rpmsg_sg_init(&sg, cpu_addr, vrp->buf_size);
>> +        rpmsg_sg_init(&sg, cpu_addr, vrp->rx_buf_size);
>>             err = virtqueue_add_inbuf(vrp->rvq, &sg, 1, cpu_addr,
>>                         GFP_KERNEL);
>> @@ -965,8 +1055,8 @@ static int rpmsg_remove_device(struct device
>> *dev, void *data)
>>   static void rpmsg_remove(struct virtio_device *vdev)
>>   {
>>       struct virtproc_info *vrp = vdev->priv;
>> -    unsigned int num_bufs = vrp->num_rx_buf + vrp->num_tx_buf;
>> -    size_t total_buf_space = num_bufs * vrp->buf_size;
>> +    size_t total_buf_space = (vrp->num_rx_buf * vrp->rx_buf_size) +
>> +                 (vrp->num_tx_buf * vrp->tx_buf_size);
>>       int ret;
>>         virtio_reset_device(vdev);
>> @@ -992,6 +1082,7 @@ static struct virtio_device_id id_table[] = {
>>     static unsigned int features[] = {
>>       VIRTIO_RPMSG_F_NS,
>> +    VIRTIO_RPMSG_F_BUFSZ,
>>   };
>>     static struct virtio_driver virtio_ipc_driver = {
>> diff --git a/include/linux/rpmsg/virtio_rpmsg.h b/include/linux/rpmsg/
>> virtio_rpmsg.h
>> new file mode 100644
>> index 000000000000..7e14da68fd17
>> --- /dev/null
>> +++ b/include/linux/rpmsg/virtio_rpmsg.h
>> @@ -0,0 +1,50 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright (C) Pinecone Inc. 2019
>> + * Copyright (C) Xiang Xiao <xiaoxiang@pinecone.net>
>> + * Copyright (C) Advanced Micro Devices, Inc. 2026
>> + */
>> +
>> +#ifndef _LINUX_VIRTIO_RPMSG_H
>> +#define _LINUX_VIRTIO_RPMSG_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/virtio_types.h>
>> +
>> +/* The feature bitmap for virtio rpmsg */
>> +#define VIRTIO_RPMSG_F_NS    0 /* RP supports name service
>> notifications */
>> +#define VIRTIO_RPMSG_F_BUFSZ    1 /* RP get buffer size from config
>> space */
>> +
>> +/* Version of struct virtio_rpmsg_config understood by this driver */
>> +#define RPMSG_VDEV_CONFIG_V1    1
>> +
>> +/**
>> + * struct virtio_rpmsg_config - config space for rpmsg virtio device
>> + *
>> + * @version:    version of this structure, currently
>> %RPMSG_VDEV_CONFIG_V1.
>> + * @reserved:    reserved for padding, must be zero.
>> + * @size:    size of this structure in bytes.
>> + * @rpmsg_buf_align:    required alignment in bytes for each buffer.
>> Must be a
>> + *        power of two so that both the buffer sizes and the TX buffer
>> + *        base address can be aligned (e.g. to a cache line).
>> + * @reserved1:    reserved for padding, must be zero. Keeps the
>> following 32-bit
>> + *        fields naturally aligned.
>> + * @txbuf_size:    Tx buf size from remote's view. For Linux this is
>> rx buf size.
>> + * @rxbuf_size:    Rx buf size from remote's view. For Linux this is
>> tx buf size.
>> + *
>> + * This is the configuration structure shared by the device and the
>> driver,
>> + * read when %VIRTIO_RPMSG_F_BUFSZ is negotiated. The fields are laid
>> out so
>> + * the structure is naturally 32-bit aligned.
>> + */
>> +struct virtio_rpmsg_config {
>> +    u8 version;
>> +    u8 reserved;
> 
> Why about defining the version type to u16 to avoid the reserved field?
> 
>> +    __virtio16 size;
>> +    __virtio16 rpmsg_buf_align;
>> +    __virtio16 reserved1;
> 
> Seems useless if __packed prevents the compiler from inserting extra
> padding
> bytes between fields,
> 
>> +    /* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
>> +    __virtio32 txbuf_size;
>> +    __virtio32 rxbuf_size;
>> +} __packed;
> 
> proposal
> 
> +struct virtio_rpmsg_config {
> +    __virtio16 version;
> +    __virtio16 size;
> +    /* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
> +    __virtio32 txbuf_size;
> +    __virtio32 rxbuf_size;
> +    __virtio16 rpmsg_buf_align;
> +} __packed;
> +
> 

I am okay with the above proposal with minor difference:

My proposal:

+struct virtio_rpmsg_config {
+	u8 version;
+	__virtio16 size;
+	__virtio16 rpmsg_buf_align;
+	/* The tx/rx individual buffer size (if VIRTIO_RPMSG_F_BUFSZ) */
+	__virtio32 txbuf_size;
+	__virtio32 rxbuf_size;
+} __packed;

I just want to keep version field 8-bit, as we will probably never use
upper byte of that field if we use 16-bit. Rest is okay. If the
strucutre is packed then reserved bytes are not needed.

Please let me know your view.

Thanks,
Tanmay


> Regards,
> Arnaud
> 
>> +
>> +#endif /* _LINUX_VIRTIO_RPMSG_H */
> 


^ permalink raw reply

* Re: [PATCH v6 10/10] RAS: add firmware-first CPER provider
From: Julian Braha @ 2026-06-17 17:40 UTC (permalink / raw)
  To: Ahmed Tiba, Rafael J. Wysocki, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Len Brown,
	Saket Dumbre, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
	Shuah Khan
  Cc: linux-kernel, linux-acpi, acpica-devel, linux-cxl, devicetree,
	linux-edac, linux-doc, Dmitry.Lamerov
In-Reply-To: <20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-10-91f725174aa0@arm.com>

Hi again Ahmed,

On 6/17/26 14:54, Ahmed Tiba wrote:
> +config RAS_CPER_ESOURCE
> +	bool "Firmware-first CPER error source block provider"
> +	select GHES_CPER_HELPERS
> +	help
> +	  Enable support for firmware-first Common Platform Error Record
> +	  (CPER) error source block providers. The current in-tree user is
> +	  described by the arm,ras-cper DeviceTree binding. The driver
> +	  reuses the existing GHES CPER helpers so the error processing
> +	  matches the ACPI code paths, but it can be built even when ACPI is
> +	  disabled.

Yep, sure enough, this patch causes a build error when you enable
RAS_CPER_ESOURCE without enabling ACPI:

drivers/firmware/efi/cper-x86.c: In function ‘cper_print_proc_ia’:
drivers/firmware/efi/cper-x86.c:352:21: error: implicit declaration of
function ‘arch_apei_report_x86_error’ [-Wimplicit-function-declaration]
  352 |                     arch_apei_report_x86_error(ctx_info,
proc->lapic_id)) {
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~

- Julian Braha

^ permalink raw reply

* Re: [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Reinette Chatre @ 2026-06-17 17:33 UTC (permalink / raw)
  To: Babu Moger, Moger, Babu, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman
In-Reply-To: <572cb801-c52c-4e36-8365-a309e2db6106@amd.com>

Hi Babu,

On 6/17/26 8:56 AM, Babu Moger wrote:
> 
> The display will look like this when the system boots up.
> 
> # cat info/kernel_mode
>   inherit_ctrl_and_mon:
>   global_assign_ctrl_assign_mon_per_cpu:group=uninitialized
>   global_assign_ctrl_assign_mon_per_cpu:group=uninitialized
> 
> There will not be any group associated with "inherit_ctrl_and_mon".
> It is only used to switch from other two modes.

Just two nitpicks (adding the "[]" to indicate effective mode and fixing the
copy&paste duplicate mode names) to confirm that I think you actually intended to
write:

# cat info/kernel_mode
  [inherit_ctrl_and_mon:]
  global_assign_ctrl_inherit_mon_per_cpu:group=uninitialized
  global_assign_ctrl_assign_mon_per_cpu:group=uninitialized

I would like to propose that the user documentation contains something like
"the kernel mode is followed by a semi-colon separated list of properties"
This implementation does not require more than one property associated with a mode
so this does not need any code changes but adding that flexibility to the user
interface should help if some future kernel mode needs more than one property.
What do you think?

Reinette


^ permalink raw reply

* [PATCH v2 10/11] hugetlb: drop vma_hugecache_offset() in favor of linear_page_index()
From: Jane Chu @ 2026-06-17 17:25 UTC (permalink / raw)
  To: akpm
  Cc: willy, jack, viro, brauner, muchun.song, osalvador, david, hughd,
	baolin.wang, linmiaohe, nao.horiguchi, lorenzo, rppt, peterx,
	corbet, linux-doc, linux-mm, linux-kernel, linux-fsdevel
In-Reply-To: <20260617172534.1740152-1-jane.chu@oracle.com>

vma_hugecache_offset() converts a hugetlb VMA address into a mapping
offset in hugepage units. While the helper is small, its name is not very
clear, and the resulting code is harder to follow than using the common MM
helper directly.

Use linear_page_index() instead, with an explicit conversion from
PAGE_SIZE units to hugepage units at each call site, and remove
vma_hugecache_offset().

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/hugetlb.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b41e7b8df094..a677ea774143 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1001,17 +1001,6 @@ static long region_count(struct resv_map *resv, long f, long t)
 	return chg;
 }
 
-/*
- * Convert the address within this vma to the page offset within
- * the mapping, huge page units here.
- */
-static pgoff_t vma_hugecache_offset(struct hstate *h,
-			struct vm_area_struct *vma, unsigned long address)
-{
-	return ((address - vma->vm_start) >> huge_page_shift(h)) +
-			(vma->vm_pgoff >> huge_page_order(h));
-}
-
 /*
  * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
  * bits of the reservation map pointer, which are always clear due to
@@ -2437,7 +2426,9 @@ static long __vma_reservation_common(struct hstate *h,
 	if (!resv)
 		return 1;
 
-	idx = vma_hugecache_offset(h, vma, addr);
+	idx = linear_page_index(vma, addr);
+	idx >>= huge_page_order(h);
+
 	switch (mode) {
 	case VMA_NEEDS_RESV:
 		ret = region_chg(resv, idx, idx + 1, &dummy_out_regions_needed);
@@ -4693,8 +4684,10 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma)
 	if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER))
 		return;
 
-	start = vma_hugecache_offset(h, vma, vma->vm_start);
-	end = vma_hugecache_offset(h, vma, vma->vm_end);
+	start = linear_page_index(vma, vma->vm_start);
+	start >>= huge_page_order(h);
+	end = linear_page_index(vma, vma->vm_end);
+	end >>= huge_page_order(h);
 
 	reserve = (end - start) - region_count(resv, start, end);
 	hugetlb_cgroup_uncharge_counter(resv, start, end);
-- 
2.43.5


^ permalink raw reply related

* [PATCH v2 11/11] hugetlb: make hugetlb_[un]reserve_pages() to take PAGE granularity index
From: Jane Chu @ 2026-06-17 17:25 UTC (permalink / raw)
  To: akpm
  Cc: willy, jack, viro, brauner, muchun.song, osalvador, david, hughd,
	baolin.wang, linmiaohe, nao.horiguchi, lorenzo, rppt, peterx,
	corbet, linux-doc, linux-mm, linux-kernel, linux-fsdevel
In-Reply-To: <20260617172534.1740152-1-jane.chu@oracle.com>

hugetlb_reserve_pages / hugetlb_unreserve_pages have two callers and
one of them is outside hugetlb. Make both functions to take PAGE granularity
index to be consistent with the rest of MM.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 Documentation/mm/hugetlbfs_reserv.rst | 19 ++++++++++---------
 fs/hugetlbfs/inode.c                  | 25 +++++++++++--------------
 mm/hugetlb.c                          | 23 ++++++++++++++++++-----
 mm/memfd.c                            | 20 ++++++--------------
 4 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/Documentation/mm/hugetlbfs_reserv.rst b/Documentation/mm/hugetlbfs_reserv.rst
index a49115db18c7..880e9ccd5b57 100644
--- a/Documentation/mm/hugetlbfs_reserv.rst
+++ b/Documentation/mm/hugetlbfs_reserv.rst
@@ -112,11 +112,12 @@ flag was specified in either the shmget() or mmap() call.  If NORESERVE
 was specified, then this routine returns immediately as no reservations
 are desired.
 
-The arguments 'from' and 'to' are huge page indices into the mapping or
-underlying file.  For shmget(), 'from' is always 0 and 'to' corresponds to
-the length of the segment/mapping.  For mmap(), the offset argument could
-be used to specify the offset into the underlying file.  In such a case,
-the 'from' and 'to' arguments have been adjusted by this offset.
+The arguments 'from' and 'to' are base page indices into the mapping or
+underlying file that must be huge page aligned.  For shmget(),
+'from' is always 0 and 'to' corresponds to the length of the segment/mapping.
+For mmap(), the offset argument could be used to specify the offset into
+the underlying file.  In such a case, the 'from' and 'to' arguments have been
+adjusted by this offset.
 
 One of the big differences between PRIVATE and SHARED mappings is the way
 in which reservations are represented in the reservation map.
@@ -136,10 +137,10 @@ to indicate this VMA owns the reservations.
 
 The reservation map is consulted to determine how many huge page reservations
 are needed for the current mapping/segment.  For private mappings, this is
-always the value (to - from).  However, for shared mappings it is possible that
-some reservations may already exist within the range (to - from).  See the
-section :ref:`Reservation Map Modifications <resv_map_modifications>`
-for details on how this is accomplished.
+always the number of huge pages covered by the range [from, to).
+However, for shared mappings it is possible that some reservations may already
+exist within the range [from, to).  See the section :ref:`Reservation Map
+Modifications <resv_map_modifications>` for details on how this is accomplished.
 
 The mapping may be associated with a subpool.  If so, the subpool is consulted
 to ensure there is sufficient space for the mapping.  It is possible that the
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 0b49a79efb08..fe1ebfd604dc 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -150,10 +150,8 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (inode->i_flags & S_PRIVATE)
 		vma_flags_set(&vma_flags, VMA_NORESERVE_BIT);
 
-	if (hugetlb_reserve_pages(inode,
-				vma->vm_pgoff >> huge_page_order(h),
-				len >> huge_page_shift(h), vma,
-				vma_flags) < 0)
+	if (hugetlb_reserve_pages(inode, vma->vm_pgoff, len >> PAGE_SHIFT,
+				  vma, vma_flags) < 0)
 		goto out;
 
 	ret = 0;
@@ -389,7 +387,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
  */
 static void remove_inode_single_folio(struct hstate *h, struct inode *inode,
 		struct address_space *mapping, struct folio *folio,
-		pgoff_t index, bool truncate_op)
+		pgoff_t idx, bool truncate_op)
 {
 	/*
 	 * If folio is mapped, it was faulted in after being
@@ -401,7 +399,7 @@ static void remove_inode_single_folio(struct hstate *h, struct inode *inode,
 	 */
 	folio_lock(folio);
 	if (unlikely(folio_mapped(folio)))
-		hugetlb_unmap_file_folio(h, mapping, folio, index);
+		hugetlb_unmap_file_folio(h, mapping, folio, idx);
 
 	/*
 	 * We must remove the folio from page cache before removing
@@ -413,8 +411,10 @@ static void remove_inode_single_folio(struct hstate *h, struct inode *inode,
 	VM_BUG_ON_FOLIO(folio_test_hugetlb_restore_reserve(folio), folio);
 	hugetlb_delete_from_page_cache(folio);
 	if (!truncate_op) {
-		if (unlikely(hugetlb_unreserve_pages(inode, index,
-							index + 1, 1)))
+		pgoff_t index = idx << huge_page_order(h);
+		pgoff_t next = index + pages_per_huge_page(h);
+
+		if (unlikely(hugetlb_unreserve_pages(inode, index, next, 1)))
 			hugetlb_fix_reserve_counts(inode);
 	}
 
@@ -476,9 +476,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 	}
 
 	if (truncate_op)
-		(void)hugetlb_unreserve_pages(inode,
-				lstart >> huge_page_shift(h),
-				LONG_MAX, freed);
+		(void)hugetlb_unreserve_pages(inode, lstart >> PAGE_SHIFT,
+					      LONG_MAX, freed);
 }
 
 static void hugetlbfs_evict_inode(struct inode *inode)
@@ -1429,9 +1428,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
 	inode->i_size = size;
 	clear_nlink(inode);
 
-	if (hugetlb_reserve_pages(inode, 0,
-			size >> huge_page_shift(hstate_inode(inode)), NULL,
-			acctflag) < 0)
+	if (hugetlb_reserve_pages(inode, 0, size >> PAGE_SHIFT, NULL, acctflag) < 0)
 		file = ERR_PTR(-ENOMEM);
 	else
 		file = alloc_file_pseudo(inode, mnt, name, O_RDWR,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a677ea774143..302f9cf9ef6b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6528,7 +6528,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
  */
 
 long hugetlb_reserve_pages(struct inode *inode,
-		long from, long to,
+		long from_idx, long to_idx,
 		struct vm_area_struct *vma,
 		vma_flags_t vma_flags)
 {
@@ -6538,14 +6538,21 @@ long hugetlb_reserve_pages(struct inode *inode,
 	struct resv_map *resv_map;
 	struct hugetlb_cgroup *h_cg = NULL;
 	long gbl_reserve, regions_needed = 0;
+	long from, to;
 	int err;
 
+	VM_WARN_ON(!IS_ALIGNED(from_idx, 1UL << huge_page_order(h)));
+	VM_WARN_ON(!IS_ALIGNED(to_idx,   1UL << huge_page_order(h)));
+
 	/* This should never happen */
-	if (from > to) {
+	if (from_idx > to_idx) {
 		VM_WARN(1, "%s called with a negative range\n", __func__);
 		return -EINVAL;
 	}
 
+	from = from_idx >> huge_page_order(h);
+	to = to_idx >> huge_page_order(h);
+
 	/*
 	 * vma specific semaphore used for pmd sharing and fault/truncation
 	 * synchronization
@@ -6715,14 +6722,20 @@ long hugetlb_reserve_pages(struct inode *inode,
 	return err;
 }
 
-long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
-								long freed)
+long hugetlb_unreserve_pages(struct inode *inode, long start_idx,
+			     long end_idx, long freed)
 {
 	struct hstate *h = hstate_inode(inode);
 	struct resv_map *resv_map = inode_resv_map(inode);
 	long chg = 0;
 	struct hugepage_subpool *spool = subpool_inode(inode);
-	long gbl_reserve;
+	long gbl_reserve, start, end;
+
+	VM_WARN_ON(!IS_ALIGNED(start_idx, 1UL << huge_page_order(h)));
+	VM_WARN_ON(!IS_ALIGNED(end_idx,   1UL << huge_page_order(h)));
+
+	start = start_idx >> huge_page_order(h);
+	end = end_idx >> huge_page_order(h);
 
 	/*
 	 * Since this routine can be called in the evict inode path for all
diff --git a/mm/memfd.c b/mm/memfd.c
index 0b5e8f111b39..24fefb1d2761 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -79,22 +79,19 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
 		 */
 		struct inode *inode = file_inode(memfd);
 		struct hstate *h = hstate_file(memfd);
-		pgoff_t idx;
+		pgoff_t next;
 		int err = -ENOMEM;
 		long nr_resv;
 
 		gfp_mask = htlb_alloc_mask(h);
 		gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
-		idx = index >> huge_page_order(h);
+		next = index + pages_per_huge_page(h);
 
-		nr_resv = hugetlb_reserve_pages(inode, idx, idx + 1, NULL, EMPTY_VMA_FLAGS);
+		nr_resv = hugetlb_reserve_pages(inode, index, next, NULL, EMPTY_VMA_FLAGS);
 		if (nr_resv < 0)
 			return ERR_PTR(nr_resv);
 
-		folio = alloc_hugetlb_folio_reserve(h,
-						    numa_node_id(),
-						    NULL,
-						    gfp_mask);
+		folio = alloc_hugetlb_folio_reserve(h, numa_node_id(), NULL, gfp_mask);
 		if (folio) {
 			u32 hash;
 
@@ -119,13 +116,8 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
 			 */
 			hash = hugetlb_fault_mutex_hash(memfd->f_mapping, index);
 			mutex_lock(&hugetlb_fault_mutex_table[hash]);
-
-			err = hugetlb_add_to_page_cache(folio,
-							memfd->f_mapping,
-							index);
-
+			err = hugetlb_add_to_page_cache(folio, memfd->f_mapping, index);
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
-
 			if (err) {
 				folio_put(folio);
 				goto err_unresv;
@@ -137,7 +129,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t index)
 		}
 err_unresv:
 		if (nr_resv > 0)
-			hugetlb_unreserve_pages(inode, idx, idx + 1, 0);
+			hugetlb_unreserve_pages(inode, index, next, 0);
 		return ERR_PTR(err);
 	}
 #endif
-- 
2.43.5


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox