Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v2] doc: Add CPU Isolation documentation
From: Steven Rostedt @ 2026-04-01 17:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Randy Dunlap, LKML, Anna-Maria Behnsen, Gabriele Monaco,
	Ingo Molnar, Jonathan Corbet, Marcelo Tosatti, Marco Crivellari,
	Michal Hocko, Paul E . McKenney, Peter Zijlstra, Phil Auld,
	Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
	linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
In-Reply-To: <ac1HV1HLErp8GkZ6@localhost.localdomain>

On Wed, 1 Apr 2026 18:27:03 +0200
Frederic Weisbecker <frederic@kernel.org> wrote:

> > > +"CPU Isolation" means leaving a CPU exclusive to a given workload
> > > +without any undesired code interference from the kernel.
> > > +
> > > +Those interferences, commonly pointed out as "noise", can be triggered  
> > 
> > nit:                                            "noise,"  
> 
> Thanks! I have applied all your suggestions, except this one for now because I don't
> really understand the typo rule behind. Any hint?

So this looks to be an American English thing (placing commas within the
quote), but from what I read, British English places the comma outside the
quote.

Here's one case I much rather go the British English way. This also means
it's only incorrect to Americans ;-)

-- Steve

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] dpll: add frequency monitoring callback ops
From: Vadim Fedorenko @ 2026-04-01 16:37 UTC (permalink / raw)
  To: Ivan Vecera, netdev
  Cc: Arkadiusz Kubalewski, David S. Miller, Donald Hunter,
	Eric Dumazet, Jakub Kicinski, Jiri Pirko, Jonathan Corbet,
	Michal Schmidt, Paolo Abeni, Petr Oros, Prathosh Satish,
	Shuah Khan, Simon Horman, linux-doc, linux-kernel
In-Reply-To: <CE3CDF40-CA7B-43A5-9DBD-A04FA37F4E57@redhat.com>

On 01/04/2026 17:29, Ivan Vecera wrote:
> Hi Vadim,
> 
> 1. dubna 2026 16:47:21 SELČ, Vadim Fedorenko <vadim.fedorenko@linux.dev> napsal:
>> On 01/04/2026 10:12, Ivan Vecera wrote:
>>> Add new callback operations for a dpll device:
>>> - freq_monitor_get(..) - to obtain current state of frequency monitor
>>>     feature from dpll device,
>>> - freq_monitor_set(..) - to allow feature configuration.
>>>
>>> Add new callback operation for a dpll pin:
>>> - measured_freq_get(..) - to obtain the measured frequency in mHz.
>>>
>>> Obtain the feature state value using the get callback and provide it to
>>> the user if the device driver implements callbacks. The measured_freq_get
>>> pin callback is only invoked when the frequency monitor is enabled.
>>> The freq_monitor_get device callback is required when measured_freq_get
>>> is provided by the driver.
>>>
>>> Execute the set callback upon user requests.
>>>
>>> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>>> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
>>> ---
>>> Changes v2 -> v3:
>>> - Made freq_monitor_get required when measured_freq_get is present (Jakub)
>>>
>>> Changes v1 -> v2:
>>> - Renamed actual-frequency to measured-frequency (Vadim)
>>> ---
>>>    drivers/dpll/dpll_netlink.c | 92 +++++++++++++++++++++++++++++++++++++
>>>    include/linux/dpll.h        | 10 ++++
>>>    2 files changed, 102 insertions(+)
>>>
>>> diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
>>> index 83cbd64abf5a4..576d0cd074bd4 100644
>>> --- a/drivers/dpll/dpll_netlink.c
>>> +++ b/drivers/dpll/dpll_netlink.c
>>> @@ -175,6 +175,26 @@ dpll_msg_add_phase_offset_monitor(struct sk_buff *msg, struct dpll_device *dpll,
>>>    	return 0;
>>>    }
>>>    +static int
>>> +dpll_msg_add_freq_monitor(struct sk_buff *msg, struct dpll_device *dpll,
>>> +			  struct netlink_ext_ack *extack)
>>> +{
>>> +	const struct dpll_device_ops *ops = dpll_device_ops(dpll);
>>> +	enum dpll_feature_state state;
>>> +	int ret;
>>> +
>>> +	if (ops->freq_monitor_set && ops->freq_monitor_get) {
>>> +		ret = ops->freq_monitor_get(dpll, dpll_priv(dpll),
>>> +					    &state, extack);
>>> +		if (ret)
>>> +			return ret;
>>> +		if (nla_put_u32(msg, DPLL_A_FREQUENCY_MONITOR, state))
>>> +			return -EMSGSIZE;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>    static int
>>>    dpll_msg_add_phase_offset_avg_factor(struct sk_buff *msg,
>>>    				     struct dpll_device *dpll,
>>> @@ -400,6 +420,40 @@ static int dpll_msg_add_ffo(struct sk_buff *msg, struct dpll_pin *pin,
>>>    			    ffo);
>>>    }
>>>    +static int dpll_msg_add_measured_freq(struct sk_buff *msg, struct dpll_pin *pin,
>>> +				      struct dpll_pin_ref *ref,
>>> +				      struct netlink_ext_ack *extack)
>>> +{
>>> +	const struct dpll_device_ops *dev_ops = dpll_device_ops(ref->dpll);
>>> +	const struct dpll_pin_ops *ops = dpll_pin_ops(ref);
>>> +	struct dpll_device *dpll = ref->dpll;
>>> +	enum dpll_feature_state state;
>>> +	u64 measured_freq;
>>> +	int ret;
>>> +
>>> +	if (!ops->measured_freq_get)
>>> +		return 0;
>>> +	if (WARN_ON(!dev_ops->freq_monitor_get))
>>> +		return -EINVAL;
>>
>> I think pin registration function has to be adjusted to not allow
>> measured_freq_get() callback if device doesn't have freq_monitor_get()
>> callback (or both freq_monitor_{s,g}et). Then this defensive part can
>> be completely removed.
> 
> Ok, make sense... Will move such check to pin registration function...
> 
> Q: with WARN_ON or without?

Well, we have to provide reason for blocking device registration
somehow, and the only way to this is via "WARN_ON"...

> 
> Thanks
> Ivan
> 


^ permalink raw reply

* Re: [PATCH net-next v3 2/3] dpll: add frequency monitoring callback ops
From: Ivan Vecera @ 2026-04-01 16:29 UTC (permalink / raw)
  To: Vadim Fedorenko, netdev
  Cc: Arkadiusz Kubalewski, David S. Miller, Donald Hunter,
	Eric Dumazet, Jakub Kicinski, Jiri Pirko, Jonathan Corbet,
	Michal Schmidt, Paolo Abeni, Petr Oros, Prathosh Satish,
	Shuah Khan, Simon Horman, linux-doc, linux-kernel
In-Reply-To: <ccb93d19-19a9-4dd9-8ac7-e0d41dbb884d@linux.dev>

Hi Vadim,

1. dubna 2026 16:47:21 SELČ, Vadim Fedorenko <vadim.fedorenko@linux.dev> napsal:
>On 01/04/2026 10:12, Ivan Vecera wrote:
>> Add new callback operations for a dpll device:
>> - freq_monitor_get(..) - to obtain current state of frequency monitor
>>    feature from dpll device,
>> - freq_monitor_set(..) - to allow feature configuration.
>> 
>> Add new callback operation for a dpll pin:
>> - measured_freq_get(..) - to obtain the measured frequency in mHz.
>> 
>> Obtain the feature state value using the get callback and provide it to
>> the user if the device driver implements callbacks. The measured_freq_get
>> pin callback is only invoked when the frequency monitor is enabled.
>> The freq_monitor_get device callback is required when measured_freq_get
>> is provided by the driver.
>> 
>> Execute the set callback upon user requests.
>> 
>> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
>> ---
>> Changes v2 -> v3:
>> - Made freq_monitor_get required when measured_freq_get is present (Jakub)
>> 
>> Changes v1 -> v2:
>> - Renamed actual-frequency to measured-frequency (Vadim)
>> ---
>>   drivers/dpll/dpll_netlink.c | 92 +++++++++++++++++++++++++++++++++++++
>>   include/linux/dpll.h        | 10 ++++
>>   2 files changed, 102 insertions(+)
>> 
>> diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
>> index 83cbd64abf5a4..576d0cd074bd4 100644
>> --- a/drivers/dpll/dpll_netlink.c
>> +++ b/drivers/dpll/dpll_netlink.c
>> @@ -175,6 +175,26 @@ dpll_msg_add_phase_offset_monitor(struct sk_buff *msg, struct dpll_device *dpll,
>>   	return 0;
>>   }
>>   +static int
>> +dpll_msg_add_freq_monitor(struct sk_buff *msg, struct dpll_device *dpll,
>> +			  struct netlink_ext_ack *extack)
>> +{
>> +	const struct dpll_device_ops *ops = dpll_device_ops(dpll);
>> +	enum dpll_feature_state state;
>> +	int ret;
>> +
>> +	if (ops->freq_monitor_set && ops->freq_monitor_get) {
>> +		ret = ops->freq_monitor_get(dpll, dpll_priv(dpll),
>> +					    &state, extack);
>> +		if (ret)
>> +			return ret;
>> +		if (nla_put_u32(msg, DPLL_A_FREQUENCY_MONITOR, state))
>> +			return -EMSGSIZE;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static int
>>   dpll_msg_add_phase_offset_avg_factor(struct sk_buff *msg,
>>   				     struct dpll_device *dpll,
>> @@ -400,6 +420,40 @@ static int dpll_msg_add_ffo(struct sk_buff *msg, struct dpll_pin *pin,
>>   			    ffo);
>>   }
>>   +static int dpll_msg_add_measured_freq(struct sk_buff *msg, struct dpll_pin *pin,
>> +				      struct dpll_pin_ref *ref,
>> +				      struct netlink_ext_ack *extack)
>> +{
>> +	const struct dpll_device_ops *dev_ops = dpll_device_ops(ref->dpll);
>> +	const struct dpll_pin_ops *ops = dpll_pin_ops(ref);
>> +	struct dpll_device *dpll = ref->dpll;
>> +	enum dpll_feature_state state;
>> +	u64 measured_freq;
>> +	int ret;
>> +
>> +	if (!ops->measured_freq_get)
>> +		return 0;
>> +	if (WARN_ON(!dev_ops->freq_monitor_get))
>> +		return -EINVAL;
>
>I think pin registration function has to be adjusted to not allow
>measured_freq_get() callback if device doesn't have freq_monitor_get()
>callback (or both freq_monitor_{s,g}et). Then this defensive part can
>be completely removed.

Ok, make sense... Will move such check to pin registration function...

Q: with WARN_ON or without?

Thanks 
Ivan


^ permalink raw reply

* Re: [PATCH v2] doc: Add CPU Isolation documentation
From: Frederic Weisbecker @ 2026-04-01 16:27 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: LKML, Anna-Maria Behnsen, Gabriele Monaco, Ingo Molnar,
	Jonathan Corbet, Marcelo Tosatti, Marco Crivellari, Michal Hocko,
	Paul E . McKenney, Peter Zijlstra, Phil Auld, Steven Rostedt,
	Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
	linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
In-Reply-To: <6d113021-6208-4dcc-a209-a2317d680e3f@infradead.org>

Le Thu, Mar 26, 2026 at 02:42:32PM -0700, Randy Dunlap a écrit :
> (Just some small comments -- take them or not.)
> 
> On 3/26/26 7:00 AM, Frederic Weisbecker wrote:
> > nohz_full was introduced in v3.10 in 2013, which means this
> > documentation is overdue for 13 years.
> > 
> > Fortunately Paul wrote a part of the needed documentation a while ago,
> > especially concerning nohz_full in Documentation/timers/no_hz.rst and
> > also about per-CPU kthreads in
> > Documentation/admin-guide/kernel-per-CPU-kthreads.rst
> > 
> > Introduce a new page that gives an overview of CPU isolation in general.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> > v2:
> >    - Fix links and code blocks (Bagas and Sebastian)
> >    - Isolation is not only about userspace, rephrase accordingly (Valentin)
> >    - Paste BIOS issues suggestion from Valentin
> >    - Include the whole rtla suite (Valentin)
> >    - Rephrase a few details (Waiman)
> >    - Talk about RCU induced overhead rather than slower RCU (Sebastian)
> > 
> >  Documentation/admin-guide/cpu-isolation.rst | 357 ++++++++++++++++++++
> >  Documentation/admin-guide/index.rst         |   1 +
> >  2 files changed, 358 insertions(+)
> >  create mode 100644 Documentation/admin-guide/cpu-isolation.rst
> > 
> > diff --git a/Documentation/admin-guide/cpu-isolation.rst b/Documentation/admin-guide/cpu-isolation.rst
> > new file mode 100644
> > index 000000000000..886dec79b056
> > --- /dev/null
> > +++ b/Documentation/admin-guide/cpu-isolation.rst
> > @@ -0,0 +1,357 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +=============
> > +CPU Isolation
> > +=============
> > +
> > +Introduction
> > +============
> > +
> > +"CPU Isolation" means leaving a CPU exclusive to a given workload
> > +without any undesired code interference from the kernel.
> > +
> > +Those interferences, commonly pointed out as "noise", can be triggered
> 
> nit:                                            "noise,"

Thanks! I have applied all your suggestions, except this one for now because I don't
really understand the typo rule behind. Any hint?

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* Re: [PATCH v6 3/4] RISC-V: KVM: Detect and expose supported HGATP G-stage modes
From: Anup Patel @ 2026-04-01 16:05 UTC (permalink / raw)
  To: fangyu.yu
  Cc: pbonzini, corbet, atish.patra, pjw, palmer, aou, alex, skhan,
	guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
	linux-riscv, linux-kernel
In-Reply-To: <20260330122601.22140-4-fangyu.yu@linux.alibaba.com>

On Mon, Mar 30, 2026 at 5:56 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Extend kvm_riscv_gstage_mode_detect() to probe all HGATP.MODE values
> supported by the host and record them in a bitmask. Keep tracking the
> maximum supported G-stage page table level for existing internal users.
>
> Also provide lightweight helpers to retrieve the supported-mode bitmask
> and validate a requested HGATP.MODE against it.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
> ---
>  arch/riscv/include/asm/kvm_gstage.h | 11 ++++++++
>  arch/riscv/kvm/gstage.c             | 43 +++++++++++++++--------------
>  2 files changed, 34 insertions(+), 20 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> index 70d9d483365e..bbf8f45c6563 100644
> --- a/arch/riscv/include/asm/kvm_gstage.h
> +++ b/arch/riscv/include/asm/kvm_gstage.h
> @@ -31,6 +31,7 @@ struct kvm_gstage_mapping {
>  #endif
>
>  extern unsigned long kvm_riscv_gstage_max_pgd_levels;
> +extern u32 kvm_riscv_gstage_supported_mode_mask;
>
>  #define kvm_riscv_gstage_pgd_xbits     2
>  #define kvm_riscv_gstage_pgd_size      (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
> @@ -102,4 +103,14 @@ static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct kvm *
>         gstage->pgd_levels = kvm->arch.pgd_levels;
>  }
>
> +static inline u32 kvm_riscv_get_hgatp_mode_mask(void)
> +{
> +       return kvm_riscv_gstage_supported_mode_mask;
> +}
> +
> +static inline bool kvm_riscv_hgatp_mode_is_valid(unsigned long mode)
> +{
> +       return kvm_riscv_gstage_supported_mode_mask & BIT(mode);
> +}
> +
>  #endif
> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
> index 7c4c34bc191b..459041255c14 100644
> --- a/arch/riscv/kvm/gstage.c
> +++ b/arch/riscv/kvm/gstage.c
> @@ -16,6 +16,8 @@ unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
>  #else
>  unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
>  #endif
> +/* Bitmask of supported HGATP.MODE encodings (BIT(HGATP_MODE_*)). */
> +u32 kvm_riscv_gstage_supported_mode_mask __ro_after_init;
>
>  #define gstage_pte_leaf(__ptep)        \
>         (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
> @@ -315,42 +317,43 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
>         }
>  }
>
> +static bool __init kvm_riscv_hgatp_mode_supported(unsigned long mode)
> +{
> +       csr_write(CSR_HGATP, mode << HGATP_MODE_SHIFT);
> +       return ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == mode);
> +}
> +
>  void __init kvm_riscv_gstage_mode_detect(void)
>  {
> +       kvm_riscv_gstage_supported_mode_mask = 0;
> +       kvm_riscv_gstage_max_pgd_levels = 0;
> +
>  #ifdef CONFIG_64BIT
> -       /* Try Sv57x4 G-stage mode */
> -       csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
> -       if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
> -               kvm_riscv_gstage_max_pgd_levels = 5;
> -               goto done;
> +       /* Try Sv39x4 G-stage mode */
> +       if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV39X4)) {
> +               kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV39X4);
> +               kvm_riscv_gstage_max_pgd_levels = 3;
>         }
>
>         /* Try Sv48x4 G-stage mode */
> -       csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
> -       if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
> +       if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV48X4)) {
> +               kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV48X4);
>                 kvm_riscv_gstage_max_pgd_levels = 4;
> -               goto done;

Keep the original approach until then NACK to this series.

Regards,
Anup

>         }
>
> -       /* Try Sv39x4 G-stage mode */
> -       csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
> -       if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
> -               kvm_riscv_gstage_max_pgd_levels = 3;
> -               goto done;
> +       /* Try Sv57x4 G-stage mode */
> +       if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV57X4)) {
> +               kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV57X4);
> +               kvm_riscv_gstage_max_pgd_levels = 5;
>         }
>  #else /* CONFIG_32BIT */
>         /* Try Sv32x4 G-stage mode */
> -       csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
> -       if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
> +       if (kvm_riscv_hgatp_mode_supported(HGATP_MODE_SV32X4)) {
> +               kvm_riscv_gstage_supported_mode_mask |= BIT(HGATP_MODE_SV32X4);
>                 kvm_riscv_gstage_max_pgd_levels = 2;
> -               goto done;
>         }
>  #endif
>
> -       /* KVM depends on !HGATP_MODE_OFF */
> -       kvm_riscv_gstage_max_pgd_levels = 0;
> -
> -done:
>         csr_write(CSR_HGATP, 0);
>         kvm_riscv_local_hfence_gvma_all();
>  }
> --
> 2.50.1
>

^ permalink raw reply

* Re: [PATCH v6 2/4] RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstage
From: Anup Patel @ 2026-04-01 16:03 UTC (permalink / raw)
  To: fangyu.yu
  Cc: pbonzini, corbet, atish.patra, pjw, palmer, aou, alex, skhan,
	guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
	linux-riscv, linux-kernel
In-Reply-To: <20260330122601.22140-3-fangyu.yu@linux.alibaba.com>

On Mon, Mar 30, 2026 at 5:56 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Gstage page-table helpers frequently chase gstage->kvm->arch to
> fetch pgd_levels. This adds noise and repeats the same dereference
> chain in hot paths.
>
> Add pgd_levels to struct kvm_gstage and initialize it from kvm->arch
> when setting up a gstage instance. Introduce kvm_riscv_gstage_init()
> to centralize initialization and switch gstage code to use
> gstage->pgd_levels.
>
> Suggested-by: Anup Patel <anup@brainfault.org>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>

LGTM.

Reviewed-by: Anup Patel <anup@brainfault.org>

Thanks,
Anup

> ---
>  arch/riscv/include/asm/kvm_gstage.h | 10 ++++++
>  arch/riscv/kvm/gstage.c             | 10 +++---
>  arch/riscv/kvm/mmu.c                | 50 ++++++-----------------------
>  3 files changed, 25 insertions(+), 45 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> index 5aa58d1f692a..70d9d483365e 100644
> --- a/arch/riscv/include/asm/kvm_gstage.h
> +++ b/arch/riscv/include/asm/kvm_gstage.h
> @@ -15,6 +15,7 @@ struct kvm_gstage {
>  #define KVM_GSTAGE_FLAGS_LOCAL         BIT(0)
>         unsigned long vmid;
>         pgd_t *pgd;
> +       unsigned long pgd_levels;
>  };
>
>  struct kvm_gstage_mapping {
> @@ -92,4 +93,13 @@ static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels)
>         }
>  }
>
> +static inline void kvm_riscv_gstage_init(struct kvm_gstage *gstage, struct kvm *kvm)
> +{
> +       gstage->kvm = kvm;
> +       gstage->flags = 0;
> +       gstage->vmid = READ_ONCE(kvm->arch.vmid.vmid);
> +       gstage->pgd = kvm->arch.pgd;
> +       gstage->pgd_levels = kvm->arch.pgd_levels;
> +}
> +
>  #endif
> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
> index 4beb9322fe76..7c4c34bc191b 100644
> --- a/arch/riscv/kvm/gstage.c
> +++ b/arch/riscv/kvm/gstage.c
> @@ -26,7 +26,7 @@ static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage,
>         unsigned long mask;
>         unsigned long shift = HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits * level);
>
> -       if (level == gstage->kvm->arch.pgd_levels - 1)
> +       if (level == gstage->pgd_levels - 1)
>                 mask = (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1;
>         else
>                 mask = PTRS_PER_PTE - 1;
> @@ -45,7 +45,7 @@ static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long pa
>         u32 i;
>         unsigned long psz = 1UL << 12;
>
> -       for (i = 0; i < gstage->kvm->arch.pgd_levels; i++) {
> +       for (i = 0; i < gstage->pgd_levels; i++) {
>                 if (page_size == (psz << (i * kvm_riscv_gstage_index_bits))) {
>                         *out_level = i;
>                         return 0;
> @@ -58,7 +58,7 @@ static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long pa
>  static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level,
>                                       unsigned long *out_pgorder)
>  {
> -       if (gstage->kvm->arch.pgd_levels < level)
> +       if (gstage->pgd_levels < level)
>                 return -EINVAL;
>
>         *out_pgorder = 12 + (level * kvm_riscv_gstage_index_bits);
> @@ -83,7 +83,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
>                                pte_t **ptepp, u32 *ptep_level)
>  {
>         pte_t *ptep;
> -       u32 current_level = gstage->kvm->arch.pgd_levels - 1;
> +       u32 current_level = gstage->pgd_levels - 1;
>
>         *ptep_level = current_level;
>         ptep = (pte_t *)gstage->pgd;
> @@ -127,7 +127,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
>                              struct kvm_mmu_memory_cache *pcache,
>                              const struct kvm_gstage_mapping *map)
>  {
> -       u32 current_level = gstage->kvm->arch.pgd_levels - 1;
> +       u32 current_level = gstage->pgd_levels - 1;
>         pte_t *next_ptep = (pte_t *)gstage->pgd;
>         pte_t *ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index fbcdd75cb9af..2d3def024270 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -24,10 +24,7 @@ static void mmu_wp_memory_region(struct kvm *kvm, int slot)
>         phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>         struct kvm_gstage gstage;
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>
>         spin_lock(&kvm->mmu_lock);
>         kvm_riscv_gstage_wp_range(&gstage, start, end);
> @@ -49,10 +46,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
>         struct kvm_gstage_mapping map;
>         struct kvm_gstage gstage;
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>
>         end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK;
>         pfn = __phys_to_pfn(hpa);
> @@ -89,10 +83,7 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
>  {
>         struct kvm_gstage gstage;
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>
>         spin_lock(&kvm->mmu_lock);
>         kvm_riscv_gstage_unmap_range(&gstage, gpa, size, false);
> @@ -109,10 +100,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>         phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
>         struct kvm_gstage gstage;
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>
>         kvm_riscv_gstage_wp_range(&gstage, start, end);
>  }
> @@ -141,10 +129,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>         phys_addr_t size = slot->npages << PAGE_SHIFT;
>         struct kvm_gstage gstage;
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>
>         spin_lock(&kvm->mmu_lock);
>         kvm_riscv_gstage_unmap_range(&gstage, gpa, size, false);
> @@ -250,10 +235,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>         if (!kvm->arch.pgd)
>                 return false;
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>         mmu_locked = spin_trylock(&kvm->mmu_lock);
>         kvm_riscv_gstage_unmap_range(&gstage, range->start << PAGE_SHIFT,
>                                      (range->end - range->start) << PAGE_SHIFT,
> @@ -275,10 +257,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>
>         WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>         if (!kvm_riscv_gstage_get_leaf(&gstage, range->start << PAGE_SHIFT,
>                                        &ptep, &ptep_level))
>                 return false;
> @@ -298,10 +277,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>
>         WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>         if (!kvm_riscv_gstage_get_leaf(&gstage, range->start << PAGE_SHIFT,
>                                        &ptep, &ptep_level))
>                 return false;
> @@ -463,10 +439,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
>         struct kvm_gstage gstage;
>         struct page *page;
>
> -       gstage.kvm = kvm;
> -       gstage.flags = 0;
> -       gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -       gstage.pgd = kvm->arch.pgd;
> +       kvm_riscv_gstage_init(&gstage, kvm);
>
>         /* Setup initial state of output mapping */
>         memset(out_map, 0, sizeof(*out_map));
> @@ -587,10 +560,7 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
>
>         spin_lock(&kvm->mmu_lock);
>         if (kvm->arch.pgd) {
> -               gstage.kvm = kvm;
> -               gstage.flags = 0;
> -               gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
> -               gstage.pgd = kvm->arch.pgd;
> +               kvm_riscv_gstage_init(&gstage, kvm);
>                 kvm_riscv_gstage_unmap_range(&gstage, 0UL,
>                         kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels), false);
>                 pgd = READ_ONCE(kvm->arch.pgd);
> --
> 2.50.1
>

^ permalink raw reply

* Re: [PATCH v6 1/4] RISC-V: KVM: Support runtime configuration for per-VM's HGATP mode
From: Anup Patel @ 2026-04-01 16:02 UTC (permalink / raw)
  To: fangyu.yu
  Cc: pbonzini, corbet, atish.patra, pjw, palmer, aou, alex, skhan,
	guoren, radim.krcmar, andrew.jones, linux-doc, kvm, kvm-riscv,
	linux-riscv, linux-kernel
In-Reply-To: <20260330122601.22140-2-fangyu.yu@linux.alibaba.com>

On Mon, Mar 30, 2026 at 5:56 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> Introduces one per-VM architecture-specific fields to support runtime
> configuration of the G-stage page table format:
>
> - kvm->arch.pgd_levels: the corresponding number of page table levels
>   for the selected mode.
>
> These fields replace the previous global variables
> kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different
> virtual machines to independently select their G-stage page table format
> instead of being forced to share the maximum mode detected by the kernel
> at boot time.
>
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>

LGTM.

Reviewed-by: Anup Patel <anup@brainfault.org>

Thanks,
Anup

> ---
>  arch/riscv/include/asm/kvm_gstage.h | 37 ++++++++++++----
>  arch/riscv/include/asm/kvm_host.h   |  1 +
>  arch/riscv/kvm/gstage.c             | 65 ++++++++++++++---------------
>  arch/riscv/kvm/main.c               | 12 +++---
>  arch/riscv/kvm/mmu.c                | 20 +++++----
>  arch/riscv/kvm/vm.c                 |  2 +-
>  arch/riscv/kvm/vmid.c               |  3 +-
>  7 files changed, 83 insertions(+), 57 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
> index 595e2183173e..5aa58d1f692a 100644
> --- a/arch/riscv/include/asm/kvm_gstage.h
> +++ b/arch/riscv/include/asm/kvm_gstage.h
> @@ -29,16 +29,22 @@ struct kvm_gstage_mapping {
>  #define kvm_riscv_gstage_index_bits    10
>  #endif
>
> -extern unsigned long kvm_riscv_gstage_mode;
> -extern unsigned long kvm_riscv_gstage_pgd_levels;
> +extern unsigned long kvm_riscv_gstage_max_pgd_levels;
>
>  #define kvm_riscv_gstage_pgd_xbits     2
>  #define kvm_riscv_gstage_pgd_size      (1UL << (HGATP_PAGE_SHIFT + kvm_riscv_gstage_pgd_xbits))
> -#define kvm_riscv_gstage_gpa_bits      (HGATP_PAGE_SHIFT + \
> -                                        (kvm_riscv_gstage_pgd_levels * \
> -                                         kvm_riscv_gstage_index_bits) + \
> -                                        kvm_riscv_gstage_pgd_xbits)
> -#define kvm_riscv_gstage_gpa_size      ((gpa_t)(1ULL << kvm_riscv_gstage_gpa_bits))
> +
> +static inline unsigned long kvm_riscv_gstage_gpa_bits(unsigned long pgd_levels)
> +{
> +       return (HGATP_PAGE_SHIFT +
> +               pgd_levels * kvm_riscv_gstage_index_bits +
> +               kvm_riscv_gstage_pgd_xbits);
> +}
> +
> +static inline gpa_t kvm_riscv_gstage_gpa_size(unsigned long pgd_levels)
> +{
> +       return BIT_ULL(kvm_riscv_gstage_gpa_bits(pgd_levels));
> +}
>
>  bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
>                                pte_t **ptepp, u32 *ptep_level);
> @@ -69,4 +75,21 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
>
>  void kvm_riscv_gstage_mode_detect(void);
>
> +static inline unsigned long kvm_riscv_gstage_mode(unsigned long pgd_levels)
> +{
> +       switch (pgd_levels) {
> +       case 2:
> +               return HGATP_MODE_SV32X4;
> +       case 3:
> +               return HGATP_MODE_SV39X4;
> +       case 4:
> +               return HGATP_MODE_SV48X4;
> +       case 5:
> +               return HGATP_MODE_SV57X4;
> +       default:
> +               WARN_ON_ONCE(1);
> +               return HGATP_MODE_OFF;
> +       }
> +}
> +
>  #endif
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 24585304c02b..478f699e9dec 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -94,6 +94,7 @@ struct kvm_arch {
>         /* G-stage page table */
>         pgd_t *pgd;
>         phys_addr_t pgd_phys;
> +       unsigned long pgd_levels;
>
>         /* Guest Timer */
>         struct kvm_guest_timer timer;
> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
> index b67d60d722c2..4beb9322fe76 100644
> --- a/arch/riscv/kvm/gstage.c
> +++ b/arch/riscv/kvm/gstage.c
> @@ -12,22 +12,21 @@
>  #include <asm/kvm_gstage.h>
>
>  #ifdef CONFIG_64BIT
> -unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV39X4;
> -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 3;
> +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 3;
>  #else
> -unsigned long kvm_riscv_gstage_mode __ro_after_init = HGATP_MODE_SV32X4;
> -unsigned long kvm_riscv_gstage_pgd_levels __ro_after_init = 2;
> +unsigned long kvm_riscv_gstage_max_pgd_levels __ro_after_init = 2;
>  #endif
>
>  #define gstage_pte_leaf(__ptep)        \
>         (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
>
> -static inline unsigned long gstage_pte_index(gpa_t addr, u32 level)
> +static inline unsigned long gstage_pte_index(struct kvm_gstage *gstage,
> +                                            gpa_t addr, u32 level)
>  {
>         unsigned long mask;
>         unsigned long shift = HGATP_PAGE_SHIFT + (kvm_riscv_gstage_index_bits * level);
>
> -       if (level == (kvm_riscv_gstage_pgd_levels - 1))
> +       if (level == gstage->kvm->arch.pgd_levels - 1)
>                 mask = (PTRS_PER_PTE * (1UL << kvm_riscv_gstage_pgd_xbits)) - 1;
>         else
>                 mask = PTRS_PER_PTE - 1;
> @@ -40,12 +39,13 @@ static inline unsigned long gstage_pte_page_vaddr(pte_t pte)
>         return (unsigned long)pfn_to_virt(__page_val_to_pfn(pte_val(pte)));
>  }
>
> -static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
> +static int gstage_page_size_to_level(struct kvm_gstage *gstage, unsigned long page_size,
> +                                    u32 *out_level)
>  {
>         u32 i;
>         unsigned long psz = 1UL << 12;
>
> -       for (i = 0; i < kvm_riscv_gstage_pgd_levels; i++) {
> +       for (i = 0; i < gstage->kvm->arch.pgd_levels; i++) {
>                 if (page_size == (psz << (i * kvm_riscv_gstage_index_bits))) {
>                         *out_level = i;
>                         return 0;
> @@ -55,21 +55,23 @@ static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
>         return -EINVAL;
>  }
>
> -static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorder)
> +static int gstage_level_to_page_order(struct kvm_gstage *gstage, u32 level,
> +                                     unsigned long *out_pgorder)
>  {
> -       if (kvm_riscv_gstage_pgd_levels < level)
> +       if (gstage->kvm->arch.pgd_levels < level)
>                 return -EINVAL;
>
>         *out_pgorder = 12 + (level * kvm_riscv_gstage_index_bits);
>         return 0;
>  }
>
> -static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize)
> +static int gstage_level_to_page_size(struct kvm_gstage *gstage, u32 level,
> +                                    unsigned long *out_pgsize)
>  {
>         int rc;
>         unsigned long page_order = PAGE_SHIFT;
>
> -       rc = gstage_level_to_page_order(level, &page_order);
> +       rc = gstage_level_to_page_order(gstage, level, &page_order);
>         if (rc)
>                 return rc;
>
> @@ -81,11 +83,11 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
>                                pte_t **ptepp, u32 *ptep_level)
>  {
>         pte_t *ptep;
> -       u32 current_level = kvm_riscv_gstage_pgd_levels - 1;
> +       u32 current_level = gstage->kvm->arch.pgd_levels - 1;
>
>         *ptep_level = current_level;
>         ptep = (pte_t *)gstage->pgd;
> -       ptep = &ptep[gstage_pte_index(addr, current_level)];
> +       ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
>         while (ptep && pte_val(ptep_get(ptep))) {
>                 if (gstage_pte_leaf(ptep)) {
>                         *ptep_level = current_level;
> @@ -97,7 +99,7 @@ bool kvm_riscv_gstage_get_leaf(struct kvm_gstage *gstage, gpa_t addr,
>                         current_level--;
>                         *ptep_level = current_level;
>                         ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep));
> -                       ptep = &ptep[gstage_pte_index(addr, current_level)];
> +                       ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
>                 } else {
>                         ptep = NULL;
>                 }
> @@ -110,7 +112,7 @@ static void gstage_tlb_flush(struct kvm_gstage *gstage, u32 level, gpa_t addr)
>  {
>         unsigned long order = PAGE_SHIFT;
>
> -       if (gstage_level_to_page_order(level, &order))
> +       if (gstage_level_to_page_order(gstage, level, &order))
>                 return;
>         addr &= ~(BIT(order) - 1);
>
> @@ -125,9 +127,9 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
>                              struct kvm_mmu_memory_cache *pcache,
>                              const struct kvm_gstage_mapping *map)
>  {
> -       u32 current_level = kvm_riscv_gstage_pgd_levels - 1;
> +       u32 current_level = gstage->kvm->arch.pgd_levels - 1;
>         pte_t *next_ptep = (pte_t *)gstage->pgd;
> -       pte_t *ptep = &next_ptep[gstage_pte_index(map->addr, current_level)];
> +       pte_t *ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
>
>         if (current_level < map->level)
>                 return -EINVAL;
> @@ -151,7 +153,7 @@ int kvm_riscv_gstage_set_pte(struct kvm_gstage *gstage,
>                 }
>
>                 current_level--;
> -               ptep = &next_ptep[gstage_pte_index(map->addr, current_level)];
> +               ptep = &next_ptep[gstage_pte_index(gstage, map->addr, current_level)];
>         }
>
>         if (pte_val(*ptep) != pte_val(map->pte)) {
> @@ -175,7 +177,7 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage,
>         out_map->addr = gpa;
>         out_map->level = 0;
>
> -       ret = gstage_page_size_to_level(page_size, &out_map->level);
> +       ret = gstage_page_size_to_level(gstage, page_size, &out_map->level);
>         if (ret)
>                 return ret;
>
> @@ -217,7 +219,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
>         u32 next_ptep_level;
>         unsigned long next_page_size, page_size;
>
> -       ret = gstage_level_to_page_size(ptep_level, &page_size);
> +       ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
>         if (ret)
>                 return;
>
> @@ -229,7 +231,7 @@ void kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
>         if (ptep_level && !gstage_pte_leaf(ptep)) {
>                 next_ptep = (pte_t *)gstage_pte_page_vaddr(ptep_get(ptep));
>                 next_ptep_level = ptep_level - 1;
> -               ret = gstage_level_to_page_size(next_ptep_level, &next_page_size);
> +               ret = gstage_level_to_page_size(gstage, next_ptep_level, &next_page_size);
>                 if (ret)
>                         return;
>
> @@ -263,7 +265,7 @@ void kvm_riscv_gstage_unmap_range(struct kvm_gstage *gstage,
>
>         while (addr < end) {
>                 found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level);
> -               ret = gstage_level_to_page_size(ptep_level, &page_size);
> +               ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
>                 if (ret)
>                         break;
>
> @@ -297,7 +299,7 @@ void kvm_riscv_gstage_wp_range(struct kvm_gstage *gstage, gpa_t start, gpa_t end
>
>         while (addr < end) {
>                 found_leaf = kvm_riscv_gstage_get_leaf(gstage, addr, &ptep, &ptep_level);
> -               ret = gstage_level_to_page_size(ptep_level, &page_size);
> +               ret = gstage_level_to_page_size(gstage, ptep_level, &page_size);
>                 if (ret)
>                         break;
>
> @@ -319,39 +321,34 @@ void __init kvm_riscv_gstage_mode_detect(void)
>         /* Try Sv57x4 G-stage mode */
>         csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
>         if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
> -               kvm_riscv_gstage_mode = HGATP_MODE_SV57X4;
> -               kvm_riscv_gstage_pgd_levels = 5;
> +               kvm_riscv_gstage_max_pgd_levels = 5;
>                 goto done;
>         }
>
>         /* Try Sv48x4 G-stage mode */
>         csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
>         if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
> -               kvm_riscv_gstage_mode = HGATP_MODE_SV48X4;
> -               kvm_riscv_gstage_pgd_levels = 4;
> +               kvm_riscv_gstage_max_pgd_levels = 4;
>                 goto done;
>         }
>
>         /* Try Sv39x4 G-stage mode */
>         csr_write(CSR_HGATP, HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
>         if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV39X4) {
> -               kvm_riscv_gstage_mode = HGATP_MODE_SV39X4;
> -               kvm_riscv_gstage_pgd_levels = 3;
> +               kvm_riscv_gstage_max_pgd_levels = 3;
>                 goto done;
>         }
>  #else /* CONFIG_32BIT */
>         /* Try Sv32x4 G-stage mode */
>         csr_write(CSR_HGATP, HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
>         if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV32X4) {
> -               kvm_riscv_gstage_mode = HGATP_MODE_SV32X4;
> -               kvm_riscv_gstage_pgd_levels = 2;
> +               kvm_riscv_gstage_max_pgd_levels = 2;
>                 goto done;
>         }
>  #endif
>
>         /* KVM depends on !HGATP_MODE_OFF */
> -       kvm_riscv_gstage_mode = HGATP_MODE_OFF;
> -       kvm_riscv_gstage_pgd_levels = 0;
> +       kvm_riscv_gstage_max_pgd_levels = 0;
>
>  done:
>         csr_write(CSR_HGATP, 0);
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index 0f3fe3986fc0..90ee0a032b9a 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -105,17 +105,17 @@ static int __init riscv_kvm_init(void)
>                 return rc;
>
>         kvm_riscv_gstage_mode_detect();
> -       switch (kvm_riscv_gstage_mode) {
> -       case HGATP_MODE_SV32X4:
> +       switch (kvm_riscv_gstage_max_pgd_levels) {
> +       case 2:
>                 str = "Sv32x4";
>                 break;
> -       case HGATP_MODE_SV39X4:
> +       case 3:
>                 str = "Sv39x4";
>                 break;
> -       case HGATP_MODE_SV48X4:
> +       case 4:
>                 str = "Sv48x4";
>                 break;
> -       case HGATP_MODE_SV57X4:
> +       case 5:
>                 str = "Sv57x4";
>                 break;
>         default:
> @@ -164,7 +164,7 @@ static int __init riscv_kvm_init(void)
>                          (rc) ? slist : "no features");
>         }
>
> -       kvm_info("using %s G-stage page table format\n", str);
> +       kvm_info("highest G-stage page table mode is %s\n", str);
>
>         kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 088d33ba90ed..fbcdd75cb9af 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -67,7 +67,7 @@ int kvm_riscv_mmu_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
>                 if (!writable)
>                         map.pte = pte_wrprotect(map.pte);
>
> -               ret = kvm_mmu_topup_memory_cache(&pcache, kvm_riscv_gstage_pgd_levels);
> +               ret = kvm_mmu_topup_memory_cache(&pcache, kvm->arch.pgd_levels);
>                 if (ret)
>                         goto out;
>
> @@ -186,7 +186,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>          * space addressable by the KVM guest GPA space.
>          */
>         if ((new->base_gfn + new->npages) >=
> -           (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT))
> +            kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels) >> PAGE_SHIFT)
>                 return -EFAULT;
>
>         hva = new->userspace_addr;
> @@ -472,7 +472,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
>         memset(out_map, 0, sizeof(*out_map));
>
>         /* We need minimum second+third level pages */
> -       ret = kvm_mmu_topup_memory_cache(pcache, kvm_riscv_gstage_pgd_levels);
> +       ret = kvm_mmu_topup_memory_cache(pcache, kvm->arch.pgd_levels);
>         if (ret) {
>                 kvm_err("Failed to topup G-stage cache\n");
>                 return ret;
> @@ -575,6 +575,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
>                 return -ENOMEM;
>         kvm->arch.pgd = page_to_virt(pgd_page);
>         kvm->arch.pgd_phys = page_to_phys(pgd_page);
> +       kvm->arch.pgd_levels = kvm_riscv_gstage_max_pgd_levels;
>
>         return 0;
>  }
> @@ -590,10 +591,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
>                 gstage.flags = 0;
>                 gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
>                 gstage.pgd = kvm->arch.pgd;
> -               kvm_riscv_gstage_unmap_range(&gstage, 0UL, kvm_riscv_gstage_gpa_size, false);
> +               kvm_riscv_gstage_unmap_range(&gstage, 0UL,
> +                       kvm_riscv_gstage_gpa_size(kvm->arch.pgd_levels), false);
>                 pgd = READ_ONCE(kvm->arch.pgd);
>                 kvm->arch.pgd = NULL;
>                 kvm->arch.pgd_phys = 0;
> +               kvm->arch.pgd_levels = 0;
>         }
>         spin_unlock(&kvm->mmu_lock);
>
> @@ -603,11 +606,12 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
>
>  void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu)
>  {
> -       unsigned long hgatp = kvm_riscv_gstage_mode << HGATP_MODE_SHIFT;
> -       struct kvm_arch *k = &vcpu->kvm->arch;
> +       struct kvm_arch *ka = &vcpu->kvm->arch;
> +       unsigned long hgatp = kvm_riscv_gstage_mode(ka->pgd_levels)
> +                             << HGATP_MODE_SHIFT;
>
> -       hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
> -       hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
> +       hgatp |= (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
> +       hgatp |= (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;
>
>         ncsr_write(CSR_HGATP, hgatp);
>
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> index 13c63ae1a78b..4d82a886102c 100644
> --- a/arch/riscv/kvm/vm.c
> +++ b/arch/riscv/kvm/vm.c
> @@ -199,7 +199,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>                 r = KVM_USER_MEM_SLOTS;
>                 break;
>         case KVM_CAP_VM_GPA_BITS:
> -               r = kvm_riscv_gstage_gpa_bits;
> +               r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
>                 break;
>         default:
>                 r = 0;
> diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
> index cf34d448289d..c15bdb1dd8be 100644
> --- a/arch/riscv/kvm/vmid.c
> +++ b/arch/riscv/kvm/vmid.c
> @@ -26,7 +26,8 @@ static DEFINE_SPINLOCK(vmid_lock);
>  void __init kvm_riscv_gstage_vmid_detect(void)
>  {
>         /* Figure-out number of VMID bits in HW */
> -       csr_write(CSR_HGATP, (kvm_riscv_gstage_mode << HGATP_MODE_SHIFT) | HGATP_VMID);
> +       csr_write(CSR_HGATP, (kvm_riscv_gstage_mode(kvm_riscv_gstage_max_pgd_levels) <<
> +                             HGATP_MODE_SHIFT) | HGATP_VMID);
>         vmid_bits = csr_read(CSR_HGATP);
>         vmid_bits = (vmid_bits & HGATP_VMID) >> HGATP_VMID_SHIFT;
>         vmid_bits = fls_long(vmid_bits);
> --
> 2.50.1
>

^ permalink raw reply

* Re: [PATCH v2] doc: Add CPU Isolation documentation
From: Frederic Weisbecker @ 2026-04-01 15:47 UTC (permalink / raw)
  To: Waiman Long
  Cc: LKML, Anna-Maria Behnsen, Gabriele Monaco, Ingo Molnar,
	Jonathan Corbet, Marcelo Tosatti, Marco Crivellari, Michal Hocko,
	Paul E . McKenney, Peter Zijlstra, Phil Auld, Steven Rostedt,
	Thomas Gleixner, Valentin Schneider, Vlastimil Babka, linux-doc,
	Sebastian Andrzej Siewior, Bagas Sanjaya
In-Reply-To: <90a6512f-5b6b-4781-87f3-4580ac426c37@redhat.com>

Le Thu, Mar 26, 2026 at 03:17:48PM -0400, Waiman Long a écrit :
> On 3/26/26 10:00 AM, Frederic Weisbecker wrote:
> > nohz_full was introduced in v3.10 in 2013, which means this
> > documentation is overdue for 13 years.
> > 
> > Fortunately Paul wrote a part of the needed documentation a while ago,
> > especially concerning nohz_full in Documentation/timers/no_hz.rst and
> > also about per-CPU kthreads in
> > Documentation/admin-guide/kernel-per-CPU-kthreads.rst
> > 
> > Introduce a new page that gives an overview of CPU isolation in general.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> > v2:
> >     - Fix links and code blocks (Bagas and Sebastian)
> >     - Isolation is not only about userspace, rephrase accordingly (Valentin)
> >     - Paste BIOS issues suggestion from Valentin
> >     - Include the whole rtla suite (Valentin)
> >     - Rephrase a few details (Waiman)
> >     - Talk about RCU induced overhead rather than slower RCU (Sebastian)
> > 
> >   Documentation/admin-guide/cpu-isolation.rst | 357 ++++++++++++++++++++
> >   Documentation/admin-guide/index.rst         |   1 +
> >   2 files changed, 358 insertions(+)
> >   create mode 100644 Documentation/admin-guide/cpu-isolation.rst
> > 
> > diff --git a/Documentation/admin-guide/cpu-isolation.rst b/Documentation/admin-guide/cpu-isolation.rst
> > new file mode 100644
> > index 000000000000..886dec79b056
> > --- /dev/null
> > +++ b/Documentation/admin-guide/cpu-isolation.rst
> > @@ -0,0 +1,357 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +=============
> > +CPU Isolation
> > +=============
> > +
> > +Introduction
> > +============
> > +
> > +"CPU Isolation" means leaving a CPU exclusive to a given workload
> > +without any undesired code interference from the kernel.
> > +
> > +Those interferences, commonly pointed out as "noise", can be triggered
> > +by asynchronous events (interrupts, timers, scheduler preemption by
> > +workqueues and kthreads, ...) or synchronous events (syscalls and page
> > +faults).
> > +
> > +Such noise usually goes unnoticed. After all synchronous events are a
> > +component of the requested kernel service. And asynchronous events are
> > +either sufficiently well distributed by the scheduler when executed
> > +as tasks or reasonably fast when executed as interrupt. The timer
> > +interrupt can even execute 1024 times per seconds without a significant
> > +and measurable impact most of the time.
> > +
> > +However some rare and extreme workloads can be quite sensitive to
> > +those kinds of noise. This is the case, for example, with high
> > +bandwidth network processing that can't afford losing a single packet
> > +or very low latency network processing. Typically those usecases
> > +involve DPDK, bypassing the kernel networking stack and performing
> > +direct access to the networking device from userscace.
> 
> As also pointed by by Sashiko, there is a typo "userscace" -> "userspace".
> There are also typos reported in
> 
> https://sashiko.dev/#/patchset/20260326140055.41555-1-frederic%40kernel.org

Thanks!

What do you think about these lines of Sashiko's review:

"""
Does this script violate the cgroup v2 "no internal process" constraint?
By enabling the cpuset controller on the test directory's
cgroup.subtree_control file, the cgroup cannot also contain processes.
"""

That is confusing me...

> 
> > +
> > +In order to run a CPU without or with limited kernel noise, the
> > +related housekeeping work needs to be either shutdown, migrated or
> > +offloaded.
> > +
> > +Housekeeping
> > +============
> > +
> > +In the CPU isolation terminology, housekeeping is the work, often
> > +asynchronous, that the kernel needs to process in order to maintain
> > +all its services. It matches the noises and disturbances enumerated
> > +above except when at least one CPU is isolated. Then housekeeping may
> > +make use of further coping mechanisms if CPU-tied work must be
> > +offloaded.
> > +
> > +Housekeeping CPUs are the non-isolated CPUs where the kernel noise
> > +is moved away from isolated CPUs.
> > +
> > +The isolation can be implemented in several ways depending on the
> > +nature of the noise:
> > +
> > +- Unbound work, where "unbound" means not tied to any CPU, can be
> > +  simply migrated away from isolated CPUs to housekeeping CPUs.
> > +  This is the case of unbound workqueues, kthreads and timers.
> > +
> > +- Bound work, where "bound" means tied to a specific CPU, usually
> > +  can't be moved away as-is by nature. Either:
> > +
> > +	- The work must switch to a locked implementation. Eg: This is
> > +	  the case of RCU with CONFIG_RCU_NOCB_CPU.
> > +
> > +	- The related feature must be shutdown and considered
> > +	  incompatible with isolated CPUs. Eg: Lockup watchdog,
> > +	  unreliable clocksources, etc...
> > +
> > +	- An elaborated and heavyweight coping mechanism stands as a
> > +	  replacement. Eg: the timer tick is shutdown on nohz_full but
> 
> "shutdown" should be 2 words as "shutdown" isn't a verb. Should we add CPU
> after "nohz_full" to make it more clear?

Right.

> 
> 
> > +	  with the constraint of running a single task on the CPU. A
> > +	  significant cost penalty is added on kernel entry/exit and
> > +	  a residual 1Hz scheduler tick is offloaded to housekeeping
> > +	  CPUs.
> > +
> > --- a/Documentation/admin-guide/index.rst
> > +++ b/Documentation/admin-guide/index.rst
> > @@ -94,6 +94,7 @@ likely to be of interest on almost any system.
> >      cgroup-v2
> >      cgroup-v1/index
> > +   cpu-isolation
> >      cpu-load
> >      mm/index
> >      module-signing
> 
> Other than the minor nits mentioned above,
> 
> Acked-by: Waiman Long <longman@redhat.com>

Thanks!

> 

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply

* [PATCH v6] hwmon: add driver for ARCTIC Fan Controller
From: Aureo Serrano de Souza @ 2026-04-01 15:39 UTC (permalink / raw)
  To: linux-hwmon
  Cc: linux, linux, corbet, skhan, linux-doc, linux-kernel,
	Aureo Serrano de Souza

Add hwmon driver for the ARCTIC Fan Controller, a USB HID device
(VID 0x3904, PID 0xF001) with 10 fan channels. Exposes fan speed in
RPM (read-only) and PWM duty cycle (0-255, read/write) via sysfs.

The device pushes IN reports at ~1 Hz containing RPM readings. PWM is
set via OUT reports; the device applies the new duty cycle and sends
back a 2-byte ACK (Report ID 0x02). The driver waits up to 1 s for
the ACK using a completion. Measured device latency: max ~563 ms over
500 iterations. PWM control is manual-only: the device never changes
duty cycle autonomously.

raw_event() may run in hardirq context, so fan_rpm[] is protected by
a spinlock with irq-save. pwm_duty[] is also protected by this spinlock
because reset_resume() clears it outside the hwmon core lock. The OUT
report buffer is built and write_pending is armed under the same lock so
that no reset_resume() can race with the pwm_duty[] snapshot. priv->buf
is exclusively accessed by write(), which the hwmon core serializes.

Signed-off-by: Aureo Serrano de Souza <aureo.serrano@arctic.de>
---
Thanks to Guenter Roeck and Thomas Weißschuh for the reviews.

Changes since v5:
- arctic_fan_probe(): switch from devm_hwmon_device_register_with_info()
  to hwmon_device_register_with_info(); store the returned pointer in
  priv->hwmon_dev for explicit teardown in remove()
- arctic_fan_remove(): call hwmon_device_unregister(priv->hwmon_dev)
  before hid_device_io_stop/hid_hw_close/hid_hw_stop; this closes the
  use-after-free window where a concurrent sysfs write could call
  hid_hw_output_report() on an already-stopped device; matches the
  removal pattern used by nzxt-smart2 and aquacomputer_d5next
- arctic_fan_write(): expand write_pending comment to document the
  residual theoretical late-ACK race (unfixable without a correlation
  ID in the device ACK report) and its practical impossibility (observed
  max ACK latency ~563 ms, timeout 1 s; a delay > 1 s indicates a
  non-functional device)
- arctic_fan_reset_resume(), arctic_fan_read(), arctic_fan_write():
  extend in_report_lock coverage to pwm_duty[]; reset_resume() clears
  pwm_duty[] outside the hwmon core lock, so all paths that read or
  write pwm_duty[] now hold in_report_lock to prevent a data race
  during resume
- arctic_fan_write(): build the OUT report buffer inside in_report_lock
  so reset_resume() cannot clear pwm_duty[] between the pwm_duty[]
  snapshot and the buffer write; this makes the lock coverage complete

Changes since v4:
- arctic_fan_write(): switch to wait_for_completion_timeout() (non-
  interruptible); eliminates the signal-interrupted write case of the
  late-ACK race that write_pending could not fully prevent
- arctic_fan_write(): guard pwm_duty[channel] commit with
  ack_status == 0 check; a device error ACK (status 0x01) no longer
  silently poisons the cached duty used in future OUT reports
- arctic_fan_probe()/remove(): replace devm_add_action_or_reset() +
  no-op remove() with explicit hid_device_io_stop/hid_hw_close/
  hid_hw_stop in remove(); devm_add_action_or_reset() was called after
  hdev->driver = NULL, causing a NULL deref in hid_hw_close() on unbind
- add reset_resume callback: device resets PWM to hardware defaults on
  power loss during suspend; driver now clears cached pwm_duty[] on
  reset-resume so stale pre-suspend values are not re-sent as if valid
- Documentation/hwmon/arctic_fan_controller.rst: document suspend/
  resume behaviour and the updated pwm[1-10] read semantics

Changes since v3:
- buf[]: upgrade from __aligned(8) to ____cacheline_aligned so the
  DMA buffer occupies its own cache line, preventing false sharing with
  adjacent fan_rpm[]/pwm_duty[] fields on non-coherent architectures
- arctic_fan_write(): add write_pending flag (protected by
  in_report_lock) so raw_event() delivers ACKs only while a write is
  in flight
- arctic_fan_write(): commit pwm_duty[channel] only after the device
  ACKs the command; a failed or timed-out write no longer leaves a
  stale value in the cached duty state
- arctic_fan_probe(): start IO (hid_device_io_start) before registering
  with hwmon; previously a sysfs write arriving between hwmon
  registration and io_start could send an OUT report whose ACK would be
  discarded by the HID core, causing a spurious timeout
- Documentation/hwmon/arctic_fan_controller.rst: document that cached
  PWM values start at 0 (hardware state unknown at probe) and that each
  OUT report carries all 10 channel values

Changes since v2:
- buf[]: add __aligned(8) for DMA safety
- ARCTIC_ACK_TIMEOUT_MS: restore 1000 ms; note observed max ~563 ms
- arctic_fan_parse_report(): replace hwmon_lock/hwmon_unlock with
  spin_lock_irqsave; hwmon_lock() may sleep and is unsafe when
  raw_event() runs in hardirq/softirq context
- arctic_fan_raw_event(): use spin_lock_irqsave for ACK path
- arctic_fan_write(): use spin_lock_irqsave for completion reinit
- arctic_fan_write(): clamp val to [0, 255] before u8 cast
- remove priv->hwmon_dev (no longer needed)

Changes since v1:
- Use hid_dbg() instead of module_param debug flag
- Move hid_device_id table adjacent to hid_driver struct
- Use get_unaligned_le16() for RPM parsing
- Remove impossible bounds/NULL checks; remove retry loop
- Add hid_is_usb() guard
- Do not update pwm_duty from IN reports (device is manual-only)
- Add completion/ACK mechanism for OUT report acknowledgement
- Add Documentation/hwmon/arctic_fan_controller.rst and MAINTAINERS

diff --git a/Documentation/hwmon/arctic_fan_controller.rst b/Documentation/hwmon/arctic_fan_controller.rst
new file mode 100644
index 0000000000..b5be88ae46
--- /dev/null
+++ b/Documentation/hwmon/arctic_fan_controller.rst
@@ -0,0 +1,56 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+Kernel driver arctic_fan_controller
+=====================================
+
+Supported devices:
+
+* ARCTIC Fan Controller (USB HID, VID 0x3904, PID 0xF001)
+
+Author: Aureo Serrano de Souza <aureo.serrano@arctic.de>
+
+Description
+-----------
+
+This driver provides hwmon support for the ARCTIC Fan Controller, a USB
+Custom HID device with 10 fan channels. The device sends IN reports about
+once per second containing current RPM values (bytes 11-30, 10 x uint16 LE).
+Fan speed control is manual-only: the device does not change PWM
+autonomously; it only applies a new duty cycle when it receives an OUT
+report from the host.
+
+After the device applies an OUT report, it sends back a 2-byte ACK IN
+report (Report ID 0x02, byte 1 = 0x00 on success) confirming the command
+was applied.
+
+Usage notes
+-----------
+
+Since it is a USB device, hotplug is supported. The device is autodetected.
+
+The device does not support GET_REPORT, so the driver cannot read back the
+current hardware PWM state at probe time. The cached PWM values (readable
+via pwm[1-10]) start at 0 and reflect only values that have been
+successfully written. Because each OUT report carries all 10 channel values,
+writing a single channel also sends the cached values for all other channels.
+Users should set all channels to the desired values before relying on the
+cached state.
+
+On system suspend, the device may lose power and reset its PWM channels to
+hardware defaults. The driver clears its cached duty values on resume so
+that reads reflect the unknown hardware state rather than stale pre-suspend
+values. Userspace is responsible for re-applying the desired duty cycles
+after resume.
+
+Sysfs entries
+-------------
+
+================ ==============================================================
+fan[1-10]_input  Fan speed in RPM (read-only). Updated from IN reports at ~1 Hz.
+pwm[1-10]        PWM duty cycle (0-255). Write: sends an OUT report setting the
+                 duty cycle (scaled from 0-255 to 0-100% for the device);
+                 the cached value is updated only after the device ACKs the
+                 command with a success status. Read: returns the last
+                 successfully written value; initialized to 0 at driver load
+                 and after resume (hardware state unknown).
+================ ==============================================================
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index b2ca8513cf..c34713040e 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -42,6 +42,7 @@ Hardware Monitoring Kernel Drivers
    aht10
    amc6821
    aquacomputer_d5next
+   arctic_fan_controller
    asb100
    asc7621
    aspeed-g6-pwm-tach
diff --git a/MAINTAINERS b/MAINTAINERS
index 96ea84948d..ec3112bd41 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2053,6 +2053,13 @@ S:	Maintained
 F:	drivers/net/arcnet/
 F:	include/uapi/linux/if_arcnet.h

+ARCTIC FAN CONTROLLER DRIVER
+M:	Aureo Serrano de Souza <aureo.serrano@arctic.de>
+L:	linux-hwmon@vger.kernel.org
+S:	Maintained
+F:	Documentation/hwmon/arctic_fan_controller.rst
+F:	drivers/hwmon/arctic_fan_controller.c
+
 ARM AND ARM64 SoC SUB-ARCHITECTURES (COMMON PARTS)
 M:	Arnd Bergmann <arnd@arndb.de>
 M:	Krzysztof Kozlowski <krzk@kernel.org>
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 328867242c..6c90a8dd40 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -388,6 +388,18 @@ config SENSORS_APPLESMC
 	  Say Y here if you have an applicable laptop and want to experience
 	  the awesome power of applesmc.

+config SENSORS_ARCTIC_FAN_CONTROLLER
+	tristate "ARCTIC Fan Controller"
+	depends on USB_HID
+	help
+	  If you say yes here you get support for the ARCTIC Fan Controller,
+	  a USB HID device (VID 0x3904, PID 0xF001) with 10 fan channels.
+	  The driver exposes fan speed (RPM) and PWM control via the hwmon
+	  sysfs interface.
+
+	  This driver can also be built as a module. If so, the module
+	  will be called arctic_fan_controller.
+
 config SENSORS_ARM_SCMI
 	tristate "ARM SCMI Sensors"
 	depends on ARM_SCMI_PROTOCOL
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 5833c807c6..ef831c3375 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -49,6 +49,7 @@ obj-$(CONFIG_SENSORS_ADT7475)	+= adt7475.o
 obj-$(CONFIG_SENSORS_AHT10)	+= aht10.o
 obj-$(CONFIG_SENSORS_APPLESMC)	+= applesmc.o
 obj-$(CONFIG_SENSORS_AQUACOMPUTER_D5NEXT) += aquacomputer_d5next.o
+obj-$(CONFIG_SENSORS_ARCTIC_FAN_CONTROLLER)	+= arctic_fan_controller.o
 obj-$(CONFIG_SENSORS_ARM_SCMI)	+= scmi-hwmon.o
 obj-$(CONFIG_SENSORS_ARM_SCPI)	+= scpi-hwmon.o
 obj-$(CONFIG_SENSORS_AS370)	+= as370-hwmon.o
diff --git a/drivers/hwmon/arctic_fan_controller.c b/drivers/hwmon/arctic_fan_controller.c
new file mode 100644
index 0000000000..2bfb003f01
--- /dev/null
+++ b/drivers/hwmon/arctic_fan_controller.c
@@ -0,0 +1,370 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux hwmon driver for ARCTIC Fan Controller
+ *
+ * USB Custom HID device with 10 fan channels.
+ * Exposes fan RPM (input) and PWM (0-255) via hwmon. Device pushes IN reports
+ * at ~1 Hz; no GET_REPORT. OUT reports set PWM duty (bytes 1-10, 0-100%).
+ * PWM is manual-only: the device does not change duty autonomously, only
+ * when it receives an OUT report from the host.
+ */
+
+#include <linux/cache.h>
+#include <linux/completion.h>
+#include <linux/err.h>
+#include <linux/hid.h>
+#include <linux/hwmon.h>
+#include <linux/jiffies.h>
+#include <linux/minmax.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/unaligned.h>
+
+#define ARCTIC_VID			0x3904
+#define ARCTIC_PID			0xF001
+#define ARCTIC_NUM_FANS			10
+#define ARCTIC_OUTPUT_REPORT_ID		0x01
+#define ARCTIC_REPORT_LEN		32
+#define ARCTIC_RPM_OFFSET		11	/* bytes 11-30: 10 x uint16 LE */
+/* ACK report: device sends Report ID 0x02, 2 bytes (ID + status) after applying OUT report */
+#define ARCTIC_ACK_REPORT_ID		0x02
+#define ARCTIC_ACK_REPORT_LEN		2
+/*
+ * Time to wait for ACK report after send.
+ * Measured over 500 iterations: max ~563 ms. Keep 1 s as margin.
+ */
+#define ARCTIC_ACK_TIMEOUT_MS		1000
+
+struct arctic_fan_data {
+	struct hid_device *hdev;
+	struct device *hwmon_dev;	/* stored for explicit unregister in remove() */
+	spinlock_t in_report_lock;	/* protects fan_rpm, ack_status, write_pending, pwm_duty */
+	struct completion in_report_received; /* ACK (ID 0x02) received in raw_event */
+	int ack_status;			/* 0 = OK, negative errno on device error */
+	bool write_pending;		/* true while an OUT report ACK is in flight */
+	u32 fan_rpm[ARCTIC_NUM_FANS];
+	u8 pwm_duty[ARCTIC_NUM_FANS];	/* 0-255 matching sysfs range; converted to 0-100 on send */
+	/*
+	 * OUT report buffer. Cache-line aligned so it occupies its own cache
+	 * line, preventing DMA cache-coherency issues with adjacent fields
+	 * (fan_rpm[], pwm_duty[]) on non-coherent architectures.
+	 * Embedded in the devm_kzalloc'd struct so it is heap-allocated and
+	 * passes usb_hcd_map_urb_for_dma(). Serialized by the hwmon core.
+	 */
+	u8 buf[ARCTIC_REPORT_LEN] ____cacheline_aligned;
+};
+
+/*
+ * Parse RPM values from the periodic status report (10 x uint16 LE at rpm_off).
+ * pwm_duty is not updated from the report: the device is manual-only, so the
+ * host cache is the authoritative source for PWM.
+ * Called from raw_event which may run in IRQ context; must not sleep.
+ */
+static void arctic_fan_parse_report(struct arctic_fan_data *priv, u8 *buf,
+				    int len, int rpm_off)
+{
+	unsigned long flags;
+	int i;
+
+	if (len < rpm_off + 20)
+		return;
+
+	spin_lock_irqsave(&priv->in_report_lock, flags);
+	for (i = 0; i < ARCTIC_NUM_FANS; i++)
+		priv->fan_rpm[i] = get_unaligned_le16(&buf[rpm_off + i * 2]);
+	spin_unlock_irqrestore(&priv->in_report_lock, flags);
+}
+
+/*
+ * raw_event: IN reports.
+ *
+ * Status report: Report ID 0x01, 32 bytes:
+ *   byte 0 = report ID, bytes 1-10 = PWM 0-100%, bytes 11-30 = 10 x RPM uint16 LE.
+ *   Device pushes these at ~1 Hz; no GET_REPORT.
+ *
+ * ACK report: Report ID 0x02, 2 bytes:
+ *   byte 0 = 0x02, byte 1 = status (0x00 = OK, 0x01 = ERROR).
+ *   Sent once after accepting and applying an OUT report (ID 0x01).
+ */
+static int arctic_fan_raw_event(struct hid_device *hdev,
+				struct hid_report *report, u8 *data, int size)
+{
+	struct arctic_fan_data *priv = hid_get_drvdata(hdev);
+	unsigned long flags;
+
+	hid_dbg(hdev, "arctic_fan: raw_event id=%u size=%d\n", report->id, size);
+
+	if (report->id == ARCTIC_ACK_REPORT_ID && size == ARCTIC_ACK_REPORT_LEN) {
+		spin_lock_irqsave(&priv->in_report_lock, flags);
+		/*
+		 * Only deliver if a write is in flight. This prevents a
+		 * late-arriving ACK from a timed-out write from erroneously
+		 * satisfying a subsequent write's completion wait.
+		 */
+		if (priv->write_pending) {
+			priv->ack_status = data[1] == 0x00 ? 0 : -EIO;
+			complete(&priv->in_report_received);
+		}
+		spin_unlock_irqrestore(&priv->in_report_lock, flags);
+		return 0;
+	}
+
+	if (report->id != ARCTIC_OUTPUT_REPORT_ID || size != ARCTIC_REPORT_LEN) {
+		hid_dbg(hdev, "arctic_fan: raw_event id=%u size=%d ignored\n",
+			report->id, size);
+		return 0;
+	}
+
+	arctic_fan_parse_report(priv, data, size, ARCTIC_RPM_OFFSET);
+	return 0;
+}
+
+static umode_t arctic_fan_is_visible(const void *data,
+				     enum hwmon_sensor_types type,
+				     u32 attr, int channel)
+{
+	if (type == hwmon_fan && attr == hwmon_fan_input)
+		return 0444;
+	if (type == hwmon_pwm && attr == hwmon_pwm_input)
+		return 0644;
+	return 0;
+}
+
+static int arctic_fan_read(struct device *dev, enum hwmon_sensor_types type,
+			   u32 attr, int channel, long *val)
+{
+	struct arctic_fan_data *priv = dev_get_drvdata(dev);
+	unsigned long flags;
+
+	if (type == hwmon_fan && attr == hwmon_fan_input) {
+		spin_lock_irqsave(&priv->in_report_lock, flags);
+		*val = priv->fan_rpm[channel];
+		spin_unlock_irqrestore(&priv->in_report_lock, flags);
+		return 0;
+	}
+	if (type == hwmon_pwm && attr == hwmon_pwm_input) {
+		spin_lock_irqsave(&priv->in_report_lock, flags);
+		*val = priv->pwm_duty[channel];
+		spin_unlock_irqrestore(&priv->in_report_lock, flags);
+		return 0;
+	}
+	return -EINVAL;
+}
+
+static int arctic_fan_write(struct device *dev, enum hwmon_sensor_types type,
+			    u32 attr, int channel, long val)
+{
+	struct arctic_fan_data *priv = dev_get_drvdata(dev);
+	u8 new_duty = (u8)clamp_val(val, 0, 255);
+	unsigned long flags;
+	unsigned long t;
+	int i, ret;
+
+	/*
+	 * Build the buffer and arm write_pending under in_report_lock so that
+	 * reset_resume() cannot clear pwm_duty[] between the pwm_duty[] read
+	 * and the buffer write, and raw_event() cannot deliver a stale ACK
+	 * from a previous write into this write's completion.
+	 *
+	 * priv->buf is heap-allocated (embedded in the devm_kzalloc'd struct),
+	 * satisfying usb_hcd_map_urb_for_dma(). Exclusively accessed by
+	 * write() which the hwmon core serializes.
+	 *
+	 * pwm_duty[channel] is committed only after a positive device ACK so a
+	 * failed or timed-out write does not corrupt the cached state.
+	 *
+	 * Residual theoretical race: if write A times out (write_pending
+	 * cleared), write B sets write_pending = true, and a late ACK from
+	 * write A—delayed beyond ARCTIC_ACK_TIMEOUT_MS—arrives during write
+	 * B's pending window, it would falsely satisfy write B's completion.
+	 * This cannot be prevented in driver code without protocol support
+	 * (for example, a correlation ID echoed in the device ACK report).
+	 * In testing, observed ACK latency stayed below the 1 s timeout
+	 * (maximum ~563 ms over 500 iterations).
+	 *
+	 * The wait is non-interruptible so that a signal cannot cause write()
+	 * to return early while the OUT report is already in flight; an
+	 * interruptible early return would create the same late-ACK window
+	 * without even the timeout guard.
+	 * Serialized by the hwmon core: only one arctic_fan_write() at a time.
+	 * Use irqsave to match the IRQ context in which raw_event may run.
+	 */
+	spin_lock_irqsave(&priv->in_report_lock, flags);
+	priv->buf[0] = ARCTIC_OUTPUT_REPORT_ID;
+	for (i = 0; i < ARCTIC_NUM_FANS; i++) {
+		u8 d = i == channel ? new_duty : priv->pwm_duty[i];
+
+		priv->buf[1 + i] = DIV_ROUND_CLOSEST((unsigned int)d * 100, 255);
+	}
+	priv->ack_status = -ETIMEDOUT;
+	priv->write_pending = true;
+	reinit_completion(&priv->in_report_received);
+	spin_unlock_irqrestore(&priv->in_report_lock, flags);
+
+	ret = hid_hw_output_report(priv->hdev, priv->buf, ARCTIC_REPORT_LEN);
+	if (ret < 0) {
+		spin_lock_irqsave(&priv->in_report_lock, flags);
+		priv->write_pending = false;
+		spin_unlock_irqrestore(&priv->in_report_lock, flags);
+		return ret;
+	}
+
+	t = wait_for_completion_timeout(&priv->in_report_received,
+					msecs_to_jiffies(ARCTIC_ACK_TIMEOUT_MS));
+	spin_lock_irqsave(&priv->in_report_lock, flags);
+	priv->write_pending = false;
+	/* Commit inside the lock so reset_resume() cannot race with this write */
+	if (t && priv->ack_status == 0)
+		priv->pwm_duty[channel] = new_duty;
+	spin_unlock_irqrestore(&priv->in_report_lock, flags);
+
+	if (!t)
+		return -ETIMEDOUT;
+	return priv->ack_status; /* 0=OK, -EIO=device error */
+}
+
+static const struct hwmon_ops arctic_fan_ops = {
+	.is_visible = arctic_fan_is_visible,
+	.read = arctic_fan_read,
+	.write = arctic_fan_write,
+};
+
+static const struct hwmon_channel_info *arctic_fan_info[] = {
+	HWMON_CHANNEL_INFO(fan,
+			   HWMON_F_INPUT, HWMON_F_INPUT, HWMON_F_INPUT,
+			   HWMON_F_INPUT, HWMON_F_INPUT, HWMON_F_INPUT,
+			   HWMON_F_INPUT, HWMON_F_INPUT, HWMON_F_INPUT,
+			   HWMON_F_INPUT),
+	HWMON_CHANNEL_INFO(pwm,
+			   HWMON_PWM_INPUT, HWMON_PWM_INPUT, HWMON_PWM_INPUT,
+			   HWMON_PWM_INPUT, HWMON_PWM_INPUT, HWMON_PWM_INPUT,
+			   HWMON_PWM_INPUT, HWMON_PWM_INPUT, HWMON_PWM_INPUT,
+			   HWMON_PWM_INPUT),
+	NULL
+};
+
+static const struct hwmon_chip_info arctic_fan_chip_info = {
+	.ops = &arctic_fan_ops,
+	.info = arctic_fan_info,
+};
+
+static int arctic_fan_reset_resume(struct hid_device *hdev)
+{
+	struct arctic_fan_data *priv = hid_get_drvdata(hdev);
+	unsigned long flags;
+
+	/*
+	 * The device resets its PWM channels to hardware defaults on power
+	 * loss during suspend. Clear the cached duty values so they reflect
+	 * the unknown hardware state, consistent with probe-time behaviour
+	 * (the device has no GET_REPORT support). Hold in_report_lock so
+	 * this does not race with a concurrent pwm read or write callback.
+	 */
+	spin_lock_irqsave(&priv->in_report_lock, flags);
+	memset(priv->pwm_duty, 0, sizeof(priv->pwm_duty));
+	spin_unlock_irqrestore(&priv->in_report_lock, flags);
+	return 0;
+}
+
+static int arctic_fan_probe(struct hid_device *hdev,
+			    const struct hid_device_id *id)
+{
+	struct arctic_fan_data *priv;
+	int ret;
+
+	if (!hid_is_usb(hdev))
+		return -ENODEV;
+
+	ret = hid_parse(hdev);
+	if (ret)
+		return ret;
+
+	priv = devm_kzalloc(&hdev->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->hdev = hdev;
+	spin_lock_init(&priv->in_report_lock);
+	init_completion(&priv->in_report_received);
+	hid_set_drvdata(hdev, priv);
+
+	ret = hid_hw_start(hdev, HID_CONNECT_DRIVER);
+	if (ret)
+		return ret;
+
+	ret = hid_hw_open(hdev);
+	if (ret)
+		goto out_stop;
+
+	/*
+	 * Start IO before registering with hwmon. If IO were started after
+	 * hwmon registration, a sysfs write arriving in that narrow window
+	 * would send an OUT report but the ACK could not be delivered (the HID
+	 * core discards events until io_started), causing a spurious timeout.
+	 */
+	hid_device_io_start(hdev);
+
+	/*
+	 * Use the non-devm variant and store the pointer so remove() can
+	 * call hwmon_device_unregister() before tearing down the HID
+	 * transport. devm_hwmon_device_register_with_info() would defer
+	 * unregistration until after remove() returns, leaving a window
+	 * where a concurrent sysfs write could call hid_hw_output_report()
+	 * on an already-stopped device (use-after-free).
+	 */
+	priv->hwmon_dev = hwmon_device_register_with_info(&hdev->dev, "arctic_fan",
+							  priv, &arctic_fan_chip_info,
+							  NULL);
+	if (IS_ERR(priv->hwmon_dev)) {
+		ret = PTR_ERR(priv->hwmon_dev);
+		goto out_close;
+	}
+
+	return 0;
+
+out_close:
+	hid_device_io_stop(hdev);
+	hid_hw_close(hdev);
+out_stop:
+	hid_hw_stop(hdev);
+	return ret;
+}
+
+static void arctic_fan_remove(struct hid_device *hdev)
+{
+	struct arctic_fan_data *priv = hid_get_drvdata(hdev);
+
+	/*
+	 * Unregister hwmon before stopping the HID transport. This removes
+	 * the sysfs files and waits for any in-progress write() callback to
+	 * return, so no hwmon op can call hid_hw_output_report() after
+	 * hid_hw_stop() frees the underlying USB resources.
+	 * Matches the pattern used by nzxt-smart2 and aquacomputer_d5next.
+	 */
+	hwmon_device_unregister(priv->hwmon_dev);
+	hid_device_io_stop(hdev);
+	hid_hw_close(hdev);
+	hid_hw_stop(hdev);
+}
+
+static const struct hid_device_id arctic_fan_id_table[] = {
+	{ HID_USB_DEVICE(ARCTIC_VID, ARCTIC_PID) },
+	{ }
+};
+MODULE_DEVICE_TABLE(hid, arctic_fan_id_table);
+
+static struct hid_driver arctic_fan_driver = {
+	.name = "arctic_fan",
+	.id_table = arctic_fan_id_table,
+	.probe = arctic_fan_probe,
+	.remove = arctic_fan_remove,
+	.raw_event = arctic_fan_raw_event,
+	.reset_resume = arctic_fan_reset_resume,
+};
+
+module_hid_driver(arctic_fan_driver);
+
+MODULE_AUTHOR("Aureo Serrano de Souza <aureo.serrano@arctic.de>");
+MODULE_DESCRIPTION("HID hwmon driver for ARCTIC Fan Controller");
+MODULE_LICENSE("GPL");

^ permalink raw reply related

* [PATCH] docs/zh_CN: add module-signing Chinese translation
From: Yan Zhu @ 2026-04-01 15:40 UTC (permalink / raw)
  To: alexs, si.yanteng, corbet
  Cc: dzm91, skhan, linux-doc, linux-kernel, zhuyan2015

Translate .../admin-guide/module-signing.rst into Chinese.

Update the translation through commit 0ad9a71933e7
("modsign: Enable ML-DSA module signing")

Signed-off-by: Yan Zhu <zhuyan2015@qq.com>
---
 .../zh_CN/admin-guide/module-signing.rst      | 242 ++++++++++++++++++
 1 file changed, 242 insertions(+)
 create mode 100644 Documentation/translations/zh_CN/admin-guide/module-signing.rst

diff --git a/Documentation/translations/zh_CN/admin-guide/module-signing.rst b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
new file mode 100644
index 000000000000..b8c209dd229d
--- /dev/null
+++ b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
@@ -0,0 +1,242 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/admin-guide/module-signing.rst
+:翻译:
+ 朱岩 Yan Zhu <zhuyan2015@qq.com>
+
+
+==========================
+内核模块签名机制
+==========================
+
+.. 目录
+..
+.. - 概述
+.. - 配置模块签名
+.. - 生成签名密钥
+.. - 内核中的公钥
+.. - 模块手动签名
+.. - 已签名模块和剥离
+.. - 加载已签名模块
+.. - 无效签名和未签名模块
+.. - 管理/保护私钥
+
+
+概述
+====
+
+内核模块签名机制在安装过程中对模块进行加密签名，然后在加载模块时检查签名。
+这通过禁止加载未签名的模块或使用无效密钥签名的模块来提高内核安全性。
+模块签名通过使恶意模块更难加载到内核中来增加安全性。
+模块签名检查在内核中完成，因此不需要受信任的用户空间位。
+
+此机制使用 X.509 ITU-T 标准证书对涉及的公钥进行编码。
+签名本身不以任何工业标准类型编码。
+内置机制目前仅支持 RSA、NIST P-384 ECDSA 和 NIST FIPS-204 ML-DSA 公钥签名标准（尽管它是可插拔的并允许使用其他标准）。
+对于 RSA 和 ECDSA，可以使用的可能的哈希算法是大小为 256、384 和 512 的 SHA-2 和 SHA-3（算法由签名中的数据选择）；
+ML-DSA会自行进行哈希运算，但允许与SHA512哈希算法结合用于签名属性。
+
+配置模块签名
+============
+
+通过进入内核配置的 :menuselection:`Enable Loadable Module Support` 菜单并打开以下选项来启用模块签名机制::
+
+	CONFIG_MODULE_SIG	"Module signature verification"
+
+这有多个可用选项：
+
+ (1) :menuselection:`Require modules to be validly signed`
+     (``CONFIG_MODULE_SIG_FORCE``)
+
+     这指定了内核应如何处理其密钥未知或未签名的模块。
+
+     如果关闭（即"宽松模式"），则允许使用不可用密钥和未签名的模块，
+     但内核将被标记为受污染，并且相关模块将被标记为受污染，显示字符'E'。
+
+     如果打开（即"限制模式"），只有具有有效签名且可由内核拥有的公钥验证的模块才会被加载。
+     所有其他模块将生成错误。
+
+     无论此处的设置如何，如果模块的签名块无法解析，它将被直接拒绝。
+
+
+ (2) :menuselection:`Automatically sign all modules`
+     (``CONFIG_MODULE_SIG_ALL``)
+
+     如果打开此选项，则在构建的 modules_install 阶段期间将自动签名模块。
+     如果关闭，则必须使用以下命令手动签名模块::
+
+	scripts/sign-file
+
+
+ (3) :menuselection:`Which hash algorithm should modules be signed with?`
+
+     这提供了安装阶段将用于签名模块的哈希算法选择：
+
+        =============================== ==========================================
+	``CONFIG_MODULE_SIG_SHA256``	:menuselection:`Sign modules with SHA-256`
+	``CONFIG_MODULE_SIG_SHA384``	:menuselection:`Sign modules with SHA-384`
+	``CONFIG_MODULE_SIG_SHA512``	:menuselection:`Sign modules with SHA-512`
+	``CONFIG_MODULE_SIG_SHA3_256``	:menuselection:`Sign modules with SHA3-256`
+	``CONFIG_MODULE_SIG_SHA3_384``	:menuselection:`Sign modules with SHA3-384`
+	``CONFIG_MODULE_SIG_SHA3_512``	:menuselection:`Sign modules with SHA3-512`
+        =============================== ==========================================
+
+     此处选择的算法也将被构建到内核中（而不是作为模块），
+     以便使用该算法签名的模块可以在不导致循环依赖的情况下检查其签名。
+
+
+ (4) :menuselection:`File name or PKCS#11 URI of module signing key`
+     (``CONFIG_MODULE_SIG_KEY``)
+
+     将此选项设置为除默认值 ``certs/signing_key.pem`` 之外的其他值将禁用签名密钥的自动生成，
+     并允许使用您选择的密钥对内核模块进行签名。
+     提供的字符串应标识包含私钥及其对应的 PEM 格式 X.509 证书的文件，
+     或者在 OpenSSL ENGINE_pkcs11 功能正常的系统上，使用 RFC7512 定义的 PKCS#11 URI。
+     在后一种情况下，PKCS#11 URI 应引用证书和私钥。
+
+     如果包含私钥的 PEM 文件已加密，或者 PKCS#11 令牌需要 PIN，
+     可以通过 ``KBUILD_SIGN_PIN`` 变量在构建时提供。
+
+
+ (5) :menuselection:`Additional X.509 keys for default system keyring`
+     (``CONFIG_SYSTEM_TRUSTED_KEYS``)
+
+     此选项可设置为包含附加证书的 PEM 编码文件的文件名，
+     这些证书将默认包含在系统密钥环中。
+
+请注意，启用模块签名会为内核构建过程添加对执行签名工具的 OpenSSL 开发包的依赖。
+
+
+生成签名密钥
+============
+
+生成和检查签名需要加密密钥对。私钥用于生成签名，相应的公钥用于检查签名。
+私钥仅在构建期间需要，之后可以删除或安全存储。
+公钥被构建到内核中，以便在加载模块时可以使用它来检查签名。
+
+在正常情况下，当 ``CONFIG_MODULE_SIG_KEY`` 保持默认值时，
+如果文件中不存在密钥对，内核构建将使用 openssl 自动生成新的密钥对::
+
+	certs/signing_key.pem
+
+在构建 vmlinux 期间（公钥需要构建到 vmlinux 中）使用参数::
+
+	certs/x509.genkey
+
+文件（如果尚不存在也会生成）。
+
+可以在 RSA（``MODULE_SIG_KEY_TYPE_RSA``）、ECDSA（``MODULE_SIG_KEY_TYPE_ECDSA``）
+和 ML-DSA（``MODULE_SIG_KEY_TYPE_MLDSA_*``）之间选择生成 RSA 4k、NIST P-384 密钥对或 ML-DSA 44、65 或 87 密钥对。
+
+强烈建议您提供自己的 x509.genkey 文件。
+
+最值得注意的是，在 x509.genkey 文件中，req_distinguished_name 部分应从默认值更改::
+
+	[ req_distinguished_name ]
+	#O = Unspecified company
+	CN = Build time autogenerated kernel key
+	#emailAddress = unspecified.user@unspecified.company
+
+生成的 RSA 密钥大小也可以通过以下方式设置::
+
+	[ req ]
+	default_bits = 4096
+
+也可以使用位于 Linux 内核源代码树根节点中的 x509.genkey 密钥生成配置文件和 openssl 命令手动生成公钥/私钥文件。
+以下是生成公钥/私钥文件的示例::
+
+	openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
+	   -config x509.genkey -outform PEM -out kernel_key.pem \
+	   -keyout kernel_key.pem
+
+然后可以将生成的 kernel_key.pem 文件的完整路径名指定在 ``CONFIG_MODULE_SIG_KEY`` 选项中，
+并且将使用其中的证书和密钥而不是自动生成的密钥对。
+
+
+内核中的公钥
+============
+
+内核包含一个可由 root 查看的公钥环。它们在名为 ".builtin_trusted_keys" 的密钥环中，
+可以通过以下方式查看::
+
+	[root@deneb ~]# cat /proc/keys
+	...
+	223c7853 I------     1 perm 1f030000     0     0 keyring   .builtin_trusted_keys: 1
+	302d2d52 I------     1 perm 1f010000     0     0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 []
+
+除了专门为模块签名生成的公钥外，还可以在 ``CONFIG_SYSTEM_TRUSTED_KEYS`` 配置选项引用的 PEM 编码文件中提供其他受信任的证书。
+
+此外，架构代码可以从硬件存储中获取公钥并将其添加（例如从 UEFI 密钥数据库）。
+
+最后，可以通过以下方式添加其他公钥::
+
+	keyctl padd asymmetric "" [.builtin_trusted_keys-ID] <[key-file]
+
+例如::
+
+	keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
+
+但是，请注意，内核只允许将由已驻留在 ``.builtin_trusted_keys`` 中的密钥有效签名的密钥添加到 ``.builtin_trusted_keys``。
+
+模块手动签名
+============
+
+要手动对模块进行签名，请使用 Linux 内核源代码树中可用的 scripts/sign-file 工具。
+该脚本需要 4 个参数：
+
+	1.  哈希算法（例如，sha256）
+	2.  私钥文件名或 PKCS#11 URI
+	3.  公钥文件名
+	4.  要签名的内核模块
+
+以下是签名内核模块的示例::
+
+	scripts/sign-file sha512 kernel-signkey.priv \
+		kernel-signkey.x509 module.ko
+
+使用的哈希算法不必与配置的算法匹配，但如果不同，
+应确保哈希算法要么内置在内核中，要么可以在不需要自身的情况下加载。
+
+如果私钥需要密码或 PIN，可以在 $KBUILD_SIGN_PIN 环境变量中提供。
+
+
+已签名模块和剥离
+================
+
+已签名模块在末尾简单地附加了数字签名。模块文件末尾的字符串
+``~Module signature appended~.`` 确认签名存在，但不能确认签名有效！
+
+已签名模块是脆弱的，因为签名在定义的ELF容器之外。
+因此，一旦计算并附加签名，就不得剥离它们。
+请注意，整个模块都是签名的有效载荷，包括签名时存在的任何和所有调试信息。
+
+
+加载已签名模块
+==============
+
+模块通过 insmod、modprobe、``init_module()`` 或 ``finit_module()`` 加载，
+与未签名模块完全一样，因为在用户空间中不进行任何处理。
+所有签名检查都在内核内完成。
+
+
+无效签名和未签名模块
+====================
+
+如果启用了 ``CONFIG_MODULE_SIG_FORCE`` 或在内核启动命令提供了 module.sig_enforce=1，
+内核将仅加载具有有效签名且具有公钥的模块。
+否则，它还将加载未签名的模块。
+任何具有不匹配签名的模块将不被允许加载。
+
+任何具有不可解析签名的模块将被拒绝。
+
+
+管理/保护私钥
+==============
+
+由于私钥用于签名模块，病毒和恶意软件可以使用私钥签名模块并危害操作系统。
+私钥必须被销毁或移动到安全位置，而不是保存在内核源代码树的根节点中。
+
+如果使用相同的私钥为多个内核配置签名模块，
+必须确保模块版本信息足以防止将模块加载到不同的内核中。
+要么设置 ``CONFIG_MODVERSIONS=y``，要么通过更改 ``EXTRAVERSION`` 或 ``CONFIG_LOCALVERSION`` 确保每个配置具有不同的内核发布字符串。
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 02/33] rust: bump Clippy's MSRV and clean `incompatible_msrv` allows
From: Danilo Krummrich @ 2026-04-01 15:39 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nathan Chancellor, Nicolas Schier, Andreas Hindborg,
	Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexandre Courbot, David Airlie, Simona Vetter,
	Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, rust-for-linux,
	linux-kbuild, Lorenzo Stoakes, Vlastimil Babka, Liam R . Howlett,
	Uladzislau Rezki, linux-block, moderated for non-subscribers,
	Alexandre Ghiti, linux-riscv, nouveau, dri-devel, Rae Moar,
	linux-kselftest, kunit-dev, Nick Desaulniers, Bill Wendling,
	Justin Stitt, llvm, linux-kernel, Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-3-ojeda@kernel.org>

On 4/1/26 1:45 PM, Miguel Ojeda wrote:
>  drivers/gpu/nova-core/gsp/cmdq.rs | 6 +-----

Acked-by: Danilo Krummrich <dakr@kernel.org>

^ permalink raw reply

* Re: [PATCH 05/33] rust: remove `RUSTC_HAS_COERCE_POINTEE` and simplify code
From: Danilo Krummrich @ 2026-04-01 15:38 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Nathan Chancellor, Nicolas Schier, Andreas Hindborg,
	Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexandre Courbot, David Airlie, Simona Vetter,
	Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Trevor Gross, rust-for-linux,
	linux-kbuild, Lorenzo Stoakes, Vlastimil Babka, Liam R . Howlett,
	Uladzislau Rezki, linux-block, moderated for non-subscribers,
	Alexandre Ghiti, linux-riscv, nouveau, dri-devel, Rae Moar,
	linux-kselftest, kunit-dev, Nick Desaulniers, Bill Wendling,
	Justin Stitt, llvm, linux-kernel, Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-6-ojeda@kernel.org>

On 4/1/26 1:45 PM, Miguel Ojeda wrote:
>  rust/kernel/alloc/kbox.rs | 29 ++---------------------------

Acked-by: Danilo Krummrich <dakr@kernel.org>

^ permalink raw reply

* Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2
From: Michael Roth @ 2026-04-01 15:35 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <20260326-gmem-inplace-conversion-v4-10-e202fe950ffd@google.com>

On Thu, Mar 26, 2026 at 03:24:19PM -0700, Ackerley Tng wrote:
> For shared to private conversions, if refcounts on any of the folios
> within the range are elevated, fail the conversion with -EAGAIN.
> 
> At the point of shared to private conversion, all folios in range are
> also unmapped. The filemap_invalidate_lock() is held, so no faulting
> can occur. Hence, from that point on, only transient refcounts can be
> taken on the folios associated with that guest_memfd.
> 
> Hence, it is safe to do the conversion from shared to private.
> 
> After conversion is complete, refcounts may become elevated, but that
> is fine since users of transient refcounts don't actually access
> memory.
> 
> For private to shared conversions, there are no refcount checks, since
> the guest is the only user of private pages, and guest_memfd will be the
> only holder of refcounts on private pages.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  Documentation/virt/kvm/api.rst |  48 +++++++-
>  include/linux/kvm_host.h       |  10 ++
>  include/uapi/linux/kvm.h       |   9 +-
>  virt/kvm/Kconfig               |   1 +
>  virt/kvm/guest_memfd.c         | 245 ++++++++++++++++++++++++++++++++++++++---
>  virt/kvm/kvm_main.c            |  17 ++-
>  6 files changed, 300 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 0b61e2579e1d8..15148c80cfdb6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -117,7 +117,7 @@ description:
>        x86 includes both i386 and x86_64.
>  
>    Type:
> -      system, vm, or vcpu.
> +      system, vm, vcpu or guest_memfd.
>  
>    Parameters:
>        what parameters are accepted by the ioctl.
> @@ -6557,11 +6557,22 @@ KVM_S390_KEYOP_SSKE
>  ---------------------------------
>  
>  :Capability: KVM_CAP_MEMORY_ATTRIBUTES2
> -:Architectures: x86
> -:Type: vm ioctl
> +:Architectures: all
> +:Type: vm, guest_memfd ioctl
>  :Parameters: struct kvm_memory_attributes2 (in/out)
>  :Returns: 0 on success, <0 on error
>  
> +Errors:
> +
> +  ========== ===============================================================
> +  EINVAL     The specified `offset` or `size` were invalid (e.g. not
> +             page aligned, causes an overflow, or size is zero).
> +  EFAULT     The parameter address was invalid.
> +  EAGAIN     Some page within requested range had unexpected refcounts. The
> +             offset of the page will be returned in `error_offset`.
> +  ENOMEM     Ran out of memory trying to track private/shared state
> +  ========== ===============================================================
> +
>  KVM_SET_MEMORY_ATTRIBUTES2 is an extension to
>  KVM_SET_MEMORY_ATTRIBUTES that supports returning (writing) values to
>  userspace.  The original (pre-extension) fields are shared with
> @@ -6572,15 +6583,42 @@ Attribute values are shared with KVM_SET_MEMORY_ATTRIBUTES.
>  ::
>  
>    struct kvm_memory_attributes2 {
> -	__u64 address;
> +	/* in */
> +	union {
> +		__u64 address;
> +		__u64 offset;
> +	};
>  	__u64 size;
>  	__u64 attributes;
>  	__u64 flags;
> -	__u64 reserved[12];
> +	/* out */
> +	__u64 error_offset;
> +	__u64 reserved[11];
>    };
>  
>    #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
>  
> +Set attributes for a range of offsets within a guest_memfd to
> +KVM_MEMORY_ATTRIBUTE_PRIVATE to limit the specified guest_memfd backed
> +memory range for guest_use. Even if KVM_CAP_GUEST_MEMFD_MMAP is
> +supported, after a successful call to set
> +KVM_MEMORY_ATTRIBUTE_PRIVATE, the requested range will not be mappable
> +into host userspace and will only be mappable by the guest.
> +
> +To allow the range to be mappable into host userspace again, call
> +KVM_SET_MEMORY_ATTRIBUTES2 on the guest_memfd again with
> +KVM_MEMORY_ATTRIBUTE_PRIVATE unset.
> +
> +If this ioctl returns -EAGAIN, the offset of the page with unexpected
> +refcounts will be returned in `error_offset`. This can occur if there
> +are transient refcounts on the pages, taken by other parts of the
> +kernel.
> +
> +Userspace is expected to figure out how to remove all known refcounts
> +on the shared pages, such as refcounts taken by get_user_pages(), and
> +try the ioctl again. A possible source of these long term refcounts is
> +if the guest_memfd memory was pinned in IOMMU page tables.
> +
>  See also: :ref: `KVM_SET_MEMORY_ATTRIBUTES`.
>  
>  .. _kvm_run:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 19f026f8de390..1ea14c66fc82e 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2514,6 +2514,16 @@ static inline bool kvm_memslot_is_gmem_only(const struct kvm_memory_slot *slot)
>  }
>  
>  #ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
> +static inline u64 kvm_supported_mem_attributes(struct kvm *kvm)
> +{
> +#ifdef kvm_arch_has_private_mem
> +	if (!kvm || kvm_arch_has_private_mem(kvm))
> +		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +#endif
> +
> +	return 0;
> +}
> +
>  typedef unsigned long (kvm_get_memory_attributes_t)(struct kvm *kvm, gfn_t gfn);
>  DECLARE_STATIC_CALL(__kvm_get_memory_attributes, kvm_get_memory_attributes_t);
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 16567d4a769e5..29baaa60de35a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -990,6 +990,7 @@ struct kvm_enable_cap {
>  #define KVM_CAP_S390_USER_OPEREXEC 246
>  #define KVM_CAP_S390_KEYOP 247
>  #define KVM_CAP_MEMORY_ATTRIBUTES2 248
> +#define KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES 249
>  
>  struct kvm_irq_routing_irqchip {
>  	__u32 irqchip;
> @@ -1642,11 +1643,15 @@ struct kvm_memory_attributes {
>  #define KVM_SET_MEMORY_ATTRIBUTES2              _IOWR(KVMIO,  0xd2, struct kvm_memory_attributes2)
>  
>  struct kvm_memory_attributes2 {
> -	__u64 address;
> +	union {
> +		__u64 address;
> +		__u64 offset;
> +	};
>  	__u64 size;
>  	__u64 attributes;
>  	__u64 flags;
> -	__u64 reserved[12];
> +	__u64 error_offset;
> +	__u64 reserved[11];
>  };
>  
>  #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 3fea89c45cfb4..e371e079e2c50 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -109,6 +109,7 @@ config KVM_VM_MEMORY_ATTRIBUTES
>  
>  config KVM_GUEST_MEMFD
>         select XARRAY_MULTI
> +       select KVM_MEMORY_ATTRIBUTES
>         bool
>  
>  config HAVE_KVM_ARCH_GMEM_PREPARE
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index d414ebfcb4c19..0cff9a85a4c53 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -183,10 +183,12 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>  
>  static enum kvm_gfn_range_filter kvm_gmem_get_invalidate_filter(struct inode *inode)
>  {
> -	if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
> -		return KVM_FILTER_SHARED;
> -
> -	return KVM_FILTER_PRIVATE;
> +	/*
> +	 * TODO: Limit invalidations based on the to-be-invalidated range, i.e.
> +	 *       invalidate shared/private if and only if there can possibly be
> +	 *       such mappings.
> +	 */
> +	return KVM_FILTER_SHARED | KVM_FILTER_PRIVATE;
>  }
>  
>  static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
> @@ -552,11 +554,235 @@ unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>  }
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_memory_attributes);
>  
> +static bool kvm_gmem_range_has_attributes(struct maple_tree *mt,
> +					  pgoff_t index, size_t nr_pages,
> +					  u64 attributes)
> +{
> +	pgoff_t end = index + nr_pages - 1;
> +	void *entry;
> +
> +	lockdep_assert(mt_lock_is_held(mt));
> +
> +	mt_for_each(mt, entry, index, end) {
> +		if (xa_to_value(entry) != attributes)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
> +					    size_t nr_pages, pgoff_t *err_index)
> +{
> +	struct address_space *mapping = inode->i_mapping;
> +	const int filemap_get_folios_refcount = 1;
> +	pgoff_t last = start + nr_pages - 1;
> +	struct folio_batch fbatch;
> +	bool safe = true;
> +	int i;
> +
> +	folio_batch_init(&fbatch);
> +	while (safe && filemap_get_folios(mapping, &start, last, &fbatch)) {
> +
> +		for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> +			struct folio *folio = fbatch.folios[i];
> +
> +			if (folio_ref_count(folio) !=
> +			    folio_nr_pages(folio) + filemap_get_folios_refcount) {
> +				safe = false;
> +				*err_index = folio->index;
> +				break;
> +			}
> +		}
> +
> +		folio_batch_release(&fbatch);
> +		cond_resched();
> +	}
> +
> +	return safe;
> +}
> +
> +/*
> + * Preallocate memory for attributes to be stored on a maple tree, pointed to
> + * by mas.  Adjacent ranges with attributes identical to the new attributes
> + * will be merged.  Also sets mas's bounds up for storing attributes.
> + *
> + * This maintains the invariant that ranges with the same attributes will
> + * always be merged.
> + */
> +static int kvm_gmem_mas_preallocate(struct ma_state *mas, u64 attributes,
> +				    pgoff_t start, size_t nr_pages)
> +{
> +	pgoff_t end = start + nr_pages;
> +	pgoff_t last = end - 1;
> +	void *entry;
> +
> +	/* Try extending range. entry is NULL on overflow/wrap-around. */
> +	mas_set_range(mas, end, end);
> +	entry = mas_find(mas, end);
> +	if (entry && xa_to_value(entry) == attributes)
> +		last = mas->last;
> +
> +	if (start > 0) {
> +		mas_set_range(mas, start - 1, start - 1);
> +		entry = mas_find(mas, start - 1);
> +		if (entry && xa_to_value(entry) == attributes)
> +			start = mas->index;
> +	}
> +
> +	mas_set_range(mas, start, last);
> +	return mas_preallocate(mas, xa_mk_value(attributes), GFP_KERNEL);
> +}
> +
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
> +{
> +	struct folio_batch fbatch;
> +	pgoff_t next = start;
> +	int i;
> +
> +	folio_batch_init(&fbatch);
> +	while (filemap_get_folios(inode->i_mapping, &next, end - 1, &fbatch)) {
> +		for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> +			struct folio *folio = fbatch.folios[i];
> +			unsigned long pfn = folio_pfn(folio);
> +
> +			kvm_arch_gmem_invalidate(pfn, pfn + folio_nr_pages(folio));
> +		}
> +
> +		folio_batch_release(&fbatch);
> +		cond_resched();
> +	}
> +}
> +#else
> +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {}
> +#endif
> +
> +static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> +				     size_t nr_pages, uint64_t attrs,
> +				     pgoff_t *err_index)
> +{
> +	bool to_private = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +	struct address_space *mapping = inode->i_mapping;
> +	struct gmem_inode *gi = GMEM_I(inode);
> +	pgoff_t end = start + nr_pages;
> +	struct maple_tree *mt;
> +	struct ma_state mas;
> +	int r;
> +
> +	mt = &gi->attributes;
> +
> +	filemap_invalidate_lock(mapping);
> +
> +	mas_init(&mas, mt, start);
> +
> +	if (kvm_gmem_range_has_attributes(mt, start, nr_pages, attrs)) {
> +		r = 0;
> +		goto out;
> +	}
> +
> +	r = kvm_gmem_mas_preallocate(&mas, attrs, start, nr_pages);
> +	if (r) {
> +		*err_index = start;
> +		goto out;
> +	}
> +
> +	if (to_private) {
> +		unmap_mapping_pages(mapping, start, nr_pages, false);
> +
> +		if (!kvm_gmem_is_safe_for_conversion(inode, start, nr_pages,
> +						     err_index)) {
> +			mas_destroy(&mas);
> +			r = -EAGAIN;
> +			goto out;
> +		}
> +	}
> +
> +	/*
> +	 * From this point on guest_memfd has performed necessary
> +	 * checks and can proceed to do guest-breaking changes.
> +	 */
> +
> +	kvm_gmem_invalidate_begin(inode, start, end);
> +
> +	if (!to_private)
> +		kvm_gmem_invalidate(inode, start, end);
> +
> +	mas_store_prealloc(&mas, xa_mk_value(attrs));
> +
> +	kvm_gmem_invalidate_end(inode, start, end);
> +out:
> +	filemap_invalidate_unlock(mapping);
> +	return r;
> +}
> +
> +static long kvm_gmem_set_attributes(struct file *file, void __user *argp)
> +{
> +	struct gmem_file *f = file->private_data;
> +	struct inode *inode = file_inode(file);
> +	struct kvm_memory_attributes2 attrs;
> +	pgoff_t err_index;
> +	size_t nr_pages;
> +	pgoff_t index;
> +	int i, r;
> +
> +	if (copy_from_user(&attrs, argp, sizeof(attrs)))
> +		return -EFAULT;
> +
> +	if (attrs.flags)
> +		return -EINVAL;
> +	if (attrs.error_offset)
> +		return -EINVAL;
> +	for (i = 0; i < ARRAY_SIZE(attrs.reserved); i++) {
> +		if (attrs.reserved[i])
> +			return -EINVAL;
> +	}
> +	if (attrs.attributes & ~kvm_supported_mem_attributes(f->kvm))
> +		return -EINVAL;
> +	if (attrs.size == 0 || attrs.offset + attrs.size < attrs.offset)
> +		return -EINVAL;
> +	if (!PAGE_ALIGNED(attrs.offset) || !PAGE_ALIGNED(attrs.size))
> +		return -EINVAL;
> +
> +	if (attrs.offset >= inode->i_size ||
> +	    attrs.offset + attrs.size > inode->i_size)
> +		return -EINVAL;
> +
> +	nr_pages = attrs.size >> PAGE_SHIFT;
> +	index = attrs.offset >> PAGE_SHIFT;
> +	r = __kvm_gmem_set_attributes(inode, index, nr_pages, attrs.attributes,
> +				      &err_index);
> +	if (r) {
> +		attrs.error_offset = ((uint64_t)err_index) << PAGE_SHIFT;
> +
> +		if (copy_to_user(argp, &attrs, sizeof(attrs)))
> +			return -EFAULT;
> +	}
> +
> +	return r;
> +}
> +
> +static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl,
> +			   unsigned long arg)
> +{
> +	switch (ioctl) {
> +	case KVM_SET_MEMORY_ATTRIBUTES2:
> +		if (vm_memory_attributes)
> +			return -ENOTTY;
> +
> +		return kvm_gmem_set_attributes(file, (void __user *)arg);
> +	default:
> +		return -ENOTTY;
> +	}
> +}
> +
> +
>  static struct file_operations kvm_gmem_fops = {
>  	.mmap		= kvm_gmem_mmap,
>  	.open		= generic_file_open,
>  	.release	= kvm_gmem_release,
>  	.fallocate	= kvm_gmem_fallocate,
> +	.unlocked_ioctl	= kvm_gmem_ioctl,
>  };
>  
>  static int kvm_gmem_migrate_folio(struct address_space *mapping,
> @@ -942,20 +1168,13 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_pfn);
>  static bool kvm_gmem_range_is_private(struct gmem_inode *gi, pgoff_t index,
>  				      size_t nr_pages, struct kvm *kvm, gfn_t gfn)
>  {
> -	pgoff_t end = index + nr_pages - 1;
> -	void *entry;
> -
>  	if (vm_memory_attributes)
>  		return kvm_range_has_vm_memory_attributes(kvm, gfn, gfn + nr_pages,
>  						       KVM_MEMORY_ATTRIBUTE_PRIVATE,
>  						       KVM_MEMORY_ATTRIBUTE_PRIVATE);
>  
> -	mt_for_each(&gi->attributes, entry, index, end) {
> -		if (xa_to_value(entry) != KVM_MEMORY_ATTRIBUTE_PRIVATE)
> -			return false;
> -	}
> -
> -	return true;
> +	return kvm_gmem_range_has_attributes(&gi->attributes, index, nr_pages,
> +					     KVM_MEMORY_ATTRIBUTE_PRIVATE);
>  }
>  
>  static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3c261904322f0..85c14197587d4 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2435,16 +2435,6 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
>  #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
>  
>  #ifdef CONFIG_KVM_MEMORY_ATTRIBUTES
> -static u64 kvm_supported_mem_attributes(struct kvm *kvm)
> -{
> -#ifdef kvm_arch_has_private_mem
> -	if (!kvm || kvm_arch_has_private_mem(kvm))
> -		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
> -#endif
> -
> -	return 0;
> -}
> -
>  #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
>  static unsigned long kvm_get_vm_memory_attributes(struct kvm *kvm, gfn_t gfn)
>  {
> @@ -2635,6 +2625,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
>  		return -EINVAL;
>  	if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
>  		return -EINVAL;
> +	if (attrs->error_offset)
> +		return -EINVAL;
>  	for (i = 0; i < ARRAY_SIZE(attrs->reserved); i++) {
>  		if (attrs->reserved[i])
>  			return -EINVAL;
> @@ -4983,6 +4975,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return 1;
>  	case KVM_CAP_GUEST_MEMFD_FLAGS:
>  		return kvm_gmem_get_supported_flags(kvm);
> +	case KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES:
> +		if (vm_memory_attributes)
> +			return 0;
> +
> +		return kvm_supported_mem_attributes(kvm);

Based on the discussion from the PUCK call this morning, it sounds like it
would be a good idea to limit kvm_supported_mem_attributes() to only
reporting KVM_MEMORY_ATTRIBUTE_PRIVATE if the underlying CoCo
implementation has all the necessary enablement to support in-place
conversion via guest_memfd. In the case of SNP, there is a
documentation/parameter check in snp_launch_update() that needs to be
relaxed in order for userspace to be able to pass in a NULL 'src'
parameter (since, for in-place conversion, it would be initialized in place
as shared memory prior to the call, since by the time kvm_gmem_poulate()
it will have been set to private and therefore cannot be faulted in via
GUP (and if it could, we'd be unecessarily copying the src back on top
of itself since src/dst are the same).

So maybe there should be an arch hook to check a whitelist of VM types
that support KVM_MEMORY_ATTRIBUTE_PRIVATE when vm_memory_attributes=0,
and if we decide to enable it for SNP as part of this series you could
include the 1-2 patches needed there, or I could enable the SNP support
separately as a small series and I guess that would then become a prereq
for the SNP self-tests?

Not sure if additional enablement is needed for TDX or not before
KVM_MEMORY_ATTRIBUTE_PRIVATE would be advertised, but similar
considerations there.

-Mike

>  #endif
>  	default:
>  		break;
> 
> -- 
> 2.53.0.1018.g2bb0e51243-goog
> 

^ permalink raw reply

* Re: [PATCH v6] hwmon: add driver for ARCTIC Fan Controller
From: Guenter Roeck @ 2026-04-01 15:32 UTC (permalink / raw)
  To: Aureo Serrano de Souza
  Cc: linux-hwmon, linux, corbet, skhan, linux-doc, linux-kernel
In-Reply-To: <20260401112654.60560-1-aureo.serrano@arctic.de>

Hi,

On Wed, Apr 01, 2026 at 07:25:54PM +0800, Aureo Serrano de Souza wrote:
> Add hwmon driver for the ARCTIC Fan Controller, a USB HID device
> (VID 0x3904, PID 0xF001) with 10 fan channels. Exposes fan speed in
> RPM (read-only) and PWM duty cycle (0-255, read/write) via sysfs.
> 
> The device pushes IN reports at ~1 Hz containing RPM readings. PWM is
> set via OUT reports; the device applies the new duty cycle and sends
> back a 2-byte ACK (Report ID 0x02). The driver waits up to 1 s for
> the ACK using a completion. Measured device latency: max ~563 ms over
> 500 iterations. PWM control is manual-only: the device never changes
> duty cycle autonomously.
> 
> raw_event() may run in hardirq context, so fan_rpm[] is protected by
> a spinlock with irq-save. pwm_duty[] is also protected by this spinlock
> because reset_resume() clears it outside the hwmon core lock. The OUT
> report buffer is built and write_pending is armed under the same lock so
> that no reset_resume() can race with the pwm_duty[] snapshot. priv->buf
> is exclusively accessed by write(), which the hwmon core serializes.
> 
> Signed-off-by: Aureo Serrano de Souza <aureo.serrano@arctic.de>
> ---

Looks like the patch is corrupted.

Applying: hwmon: add driver for ARCTIC Fan Controller
error: corrupt patch at line 587
error: could not build fake ancestor
Patch failed at 0001 hwmon: add driver for ARCTIC Fan Controller

I can not figure out what is wrong, but both git and patch report that it is
corrupted. Please fix and resend.

Thanks,
Guenter

^ permalink raw reply

* Re: [PATCH 33/33] rust: kbuild: allow `clippy::precedence` for Rust < 1.86.0
From: Gary Guo @ 2026-04-01 15:28 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-34-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> The Clippy `precedence` lint was extended in Rust 1.85.0 to include
> bitmasking and shift operations [1]. However, because it generated
> many hits, in Rust 1.86.0 it was split into a new `precedence_bits`
> lint which is not enabled by default [2].
> 
> In other words, only Rust 1.85 has a different behavior. For instance,
> it reports:
> 
>     warning: operator precedence can trip the unwary
>       --> drivers/gpu/nova-core/fb/hal/ga100.rs:16:5
>        |
>     16 | /     u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::read(bar).adr_39_08()) << FLUSH_SYSMEM_ADDR_SHIFT
>     17 | |         | u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::read(bar).adr_63_40())
>     18 | |             << FLUSH_SYSMEM_ADDR_SHIFT_HI
>        | |_________________________________________^
>        |
>        = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#precedence
>        = note: `-W clippy::precedence` implied by `-W clippy::all`
>        = help: to override `-W clippy::all` add `#[allow(clippy::precedence)]`
>     help: consider parenthesizing your expression
>        |
>     16 ~     (u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR::read(bar).adr_39_08()) << FLUSH_SYSMEM_ADDR_SHIFT) | (u64::from(regs::NV_PFB_NISO_FLUSH_SYSMEM_ADDR_HI::read(bar).adr_63_40())
>     17 +             << FLUSH_SYSMEM_ADDR_SHIFT_HI)
>        |
> 
>     warning: operator precedence can trip the unwary
>        --> drivers/gpu/nova-core/vbios.rs:511:17
>         |
>     511 | /                 u32::from(data[29]) << 24
>     512 | |                     | u32::from(data[28]) << 16
>     513 | |                     | u32::from(data[27]) << 8
>         | |______________________________________________^
>         |
>         = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#precedence
>     help: consider parenthesizing your expression
>         |
>     511 ~                 u32::from(data[29]) << 24
>     512 +                     | u32::from(data[28]) << 16 | (u32::from(data[27]) << 8)
>         |
> 
>     warning: operator precedence can trip the unwary
>        --> drivers/gpu/nova-core/vbios.rs:511:17
>         |
>     511 | /                 u32::from(data[29]) << 24
>     512 | |                     | u32::from(data[28]) << 16
>         | |_______________________________________________^ help: consider parenthesizing your expression: `(u32::from(data[29]) << 24) | (u32::from(data[28]) << 16)`
>         |
>         = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#precedence
> 
> While so far we try our best to keep all versions Clippy-clean, the
> minimum (which is now Rust 1.85.0 after the bump) and the latest stable
> are the most important ones; and this may be considered "false positives"
> with respect to the behavior in other versions.
> 
> Thus allow this lint for this version using the per-version flags
> mechanism introduced in the previous commit.
> 
> Link: https://github.com/rust-lang/rust-clippy/issues/14097 [1]
> Link: https://github.com/rust-lang/rust-clippy/pull/14115 [2]
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

Link: https://lore.kernel.org/rust-for-linux/DFVDKMMA7KPC.2DN0951H3H55Y@kernel.org/
Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)


^ permalink raw reply

* Re: [PATCH 32/33] rust: kbuild: support global per-version flags
From: Gary Guo @ 2026-04-01 15:26 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-33-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> Sometimes it is useful to gate global Rust flags per compiler version.
> For instance, we may want to disable a lint that has false positives in
> a single version [1].
>
> We already had helpers like `rustc-min-version` for that, which we use
> elsewhere, but we cannot currently use them for `rust_common_flags`,
> which contains the global flags for all Rust code (kernel and host),
> because `rustc-min-version` depends on `CONFIG_RUSTC_VERSION`, which
> does not exist when `rust_common_flags` is defined.
>
> Thus, to support that, introduce `rust_common_flags_per_version`,
> defined after the `include/config/auto.conf` inclusion (where
> `CONFIG_RUSTC_VERSION` becomes available), and append it to
> `rust_common_flags`, `KBUILD_HOSTRUSTFLAGS` and `KBUILD_RUSTFLAGS`.
>
> An alternative is moving all those three down, but that would mean
> separating them from the other `KBUILD_*` variables.

I think I would prefer moving these down.

The current approach append the flags to all variables, which will cause the
following equivalence to stop holding after the flag update.

KBUILD_HOSTRUSTFLAGS := $(rust_common_flags) -O -Cstrip=debuginfo \
			-Zallow-features= $(HOSTRUSTFLAGS)

(Per version flags doesn't go before -O anymore, it comes after HOSTRUSTFLAGS).

Best,
Gary

>
> Link: https://lore.kernel.org/rust-for-linux/CANiq72mWdFU11GcCZRchzhy0Gi1QZShvZtyRkHV2O+WA2uTdVQ@mail.gmail.com/ [1]
> Link: https://patch.msgid.link/20260307170929.153892-1-ojeda@kernel.org
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
> ---
>  Makefile | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 1a219bf1c771..20c8179d96ee 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -834,6 +834,14 @@ endif # CONFIG_TRACEPOINTS
>  
>  export WARN_ON_UNUSED_TRACEPOINTS
>  
> +# Per-version Rust flags. These are like `rust_common_flags`, but may
> +# depend on the Rust compiler version (e.g. using `rustc-min-version`).
> +rust_common_flags_per_version :=
> +
> +rust_common_flags += $(rust_common_flags_per_version)
> +KBUILD_HOSTRUSTFLAGS += $(rust_common_flags_per_version)
> +KBUILD_RUSTFLAGS += $(rust_common_flags_per_version)
> +
>  include $(srctree)/arch/$(SRCARCH)/Makefile
>  
>  ifdef need-config


^ permalink raw reply

* Re: [PATCH 31/33] rust: declare cfi_encoding for lru_status
From: Gary Guo @ 2026-04-01 15:20 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-32-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> From: Alice Ryhl <aliceryhl@google.com>
> 
> By default bindgen will convert 'enum lru_status' into a typedef for an
> integer. For the most part, an integer of the same size as the enum
> results in the correct ABI, but in the specific case of CFI, that is not
> the case. The CFI encoding is supposed to be the same as a struct called
> 'lru_status' rather than the name of the underlying native integer type.
> 
> To fix this, tell bindgen to generate a newtype and set the CFI type
> explicitly. Note that we need to set the CFI attribute explicitly as
> bindgen is using repr(transparent), which is otherwise identical to the
> inner type for ABI purposes.
> 
> This allows us to remove the page range helper C function in Binder
> without risking a CFI failure when list_lru_walk calls the provided
> function pointer.
> 
> The --with-attribute-custom-enum argument requires bindgen v0.71 or
> greater.
> 
> [ In particular, the feature was added in 0.71.0 [1][2].
> 
>   In addition, `feature(cfi_encoding)` has been available since
>   Rust 1.71.0 [3].
> 
>   Link: https://github.com/rust-lang/rust-bindgen/issues/2520 [1]
>   Link: https://github.com/rust-lang/rust-bindgen/pull/2866 [2]
>   Link: https://github.com/rust-lang/rust/pull/105452 [3]
> 
>     - Miguel ]
> 
> My testing procedure was to add this to the android17-6.18 branch and
> verify that rust_shrink_free_page is successfully called without crash,
> and verify that it does in fact crash when the cfi_encoding is set to
> other values. Note that I couldn't test this on android16-6.12 as that
> branch uses a bindgen version that is too old.
> 
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> Link: https://patch.msgid.link/20260223-cfi-lru-status-v2-1-89c6448a63a4@google.com
> [ Rebased on top of the minimum Rust version bump series which provide
>   the required `bindgen` version. - Miguel ]
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  drivers/android/binder/Makefile            |  3 +--
>  drivers/android/binder/page_range.rs       |  6 +++---
>  drivers/android/binder/page_range_helper.c | 24 ----------------------
>  drivers/android/binder/page_range_helper.h | 15 --------------
>  rust/bindgen_parameters                    |  4 ++++
>  rust/bindings/bindings_helper.h            |  1 -
>  rust/bindings/lib.rs                       |  1 +
>  rust/uapi/lib.rs                           |  1 +
>  8 files changed, 10 insertions(+), 45 deletions(-)
>  delete mode 100644 drivers/android/binder/page_range_helper.c
>  delete mode 100644 drivers/android/binder/page_range_helper.h


^ permalink raw reply

* Re: [PATCH 30/33] docs: rust: general-information: use real example
From: Gary Guo @ 2026-04-01 15:16 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-31-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> Currently the example in the documentation shows a version-based name
> for the Kconfig example:
> 
>     RUSTC_VERSION_MIN_107900
> 
> The reason behind it was to possibly avoid repetition in case several
> features used the same minimum.
> 
> However, we ended up preferring to give them a descriptive name for each
> feature added even if that could lead to some repetition. In practice,
> the repetition has not happened so far, and even if it does at some point,
> it is not a big deal.
> 
> Thus replace the example in the documentation with one of our current
> examples (after removing previous ones from the bump), to show how they
> actually look like, and in case someone `grep`s for it.
> 
> In addition, it has the advantage that it shows the `RUSTC_HAS_*`
> pattern we follow in `init/Kconfig`, similar to the C side.
> 
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  Documentation/rust/general-information.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)


^ permalink raw reply

* Re: [PATCH 29/33] docs: rust: general-information: simplify Kconfig example
From: Gary Guo @ 2026-04-01 15:16 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-30-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> There is no need to use `def_bool y if <expr>` -- one can simply write
> `def_bool <expr>`.
> 
> In fact, the simpler form is how we actually use them in practice in
> `init/Kconfig`.
> 
> Thus simplify the example.
> 
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  Documentation/rust/general-information.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)


^ permalink raw reply

* Re: [PATCH RFC v4 07/44] KVM: guest_memfd: Only prepare folios for private pages
From: Michael Roth @ 2026-04-01 15:16 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jroedel, jthoughton, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm
In-Reply-To: <CAEvNRgF+FjJ1EWSR_rzD1=N040ZitiRrM2O3N0Kj5yN5rT3h+Q@mail.gmail.com>

On Wed, Apr 01, 2026 at 07:05:16AM -0700, Ackerley Tng wrote:
> Ackerley Tng <ackerleytng@google.com> writes:
> 
> > All-shared guest_memfd used to be only supported for non-CoCo VMs where
> > preparation doesn't apply. INIT_SHARED is about to be supported for
> > non-CoCo VMs in a later patch in this series.
> >
> > In addition, KVM_SET_MEMORY_ATTRIBUTES2 is about to be supported in
> > guest_memfd in a later patch in this series.
> >
> > This means that the kvm fault handler may now call kvm_gmem_get_pfn() on a
> > shared folio for a CoCo VM where preparation applies.
> >
> > Add a check to make sure that preparation is only performed for private
> > folios.
> >
> > Preparation will be undone on freeing (see kvm_gmem_free_folio()) and on
> > conversion to shared.
> >
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > ---
> >  virt/kvm/guest_memfd.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index b6ffa8734175d..d414ebfcb4c19 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -900,6 +900,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >  		     int *max_order)
> >  {
> >  	pgoff_t index = kvm_gmem_get_index(slot, gfn);
> > +	struct inode *inode;
> >  	struct folio *folio;
> >  	int r = 0;
> >
> > @@ -907,7 +908,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >  	if (!file)
> >  		return -EFAULT;
> >
> > -	filemap_invalidate_lock_shared(file_inode(file)->i_mapping);
> > +	inode = file_inode(file);
> > +	filemap_invalidate_lock_shared(inode->i_mapping);
> >
> >  	folio = __kvm_gmem_get_pfn(file, slot, index, pfn, max_order);
> >  	if (IS_ERR(folio)) {
> > @@ -920,7 +922,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >  		folio_mark_uptodate(folio);
> >  	}
> >
> > -	r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
> > +	if (kvm_gmem_is_private_mem(inode, index))
> > +		r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
> 
> Michael, I might have misunderstood you at the last guest_memfd call:
> sev_gmem_prepare() doesn't prepare a page for being a shared page,
> right? Does this work? That prepare is only called to "make private"?

Hmm, I guess your guest_memfd-inplace-conversion-v4 branch is out of sync with
these patches?

I have the below local patch based on top of that for SNP-specific enablement,
which is basically identically, so suffice to say: yes, this should work
for SNP :) If any architecture pops up that needs to do some prep in
advance of mapping shared pages, then we could potentially plumb the
shared/private flag through to the arch-specific prep hook, as was also
suggested on the call, but it doesn't seem like that's needed by any
users for now.

-Mike

  Author: Michael Roth <michael.roth@amd.com>
  Date:   Mon Oct 27 07:58:32 2025 -0500
  
      KVM: guest_memfd: Don't prepare shared folios
  
      In the current guest_memfd logic, "preparation" is only used currently
      to describe the additional work of putting a guest_memfd page into an
      architecturally-defined "private" state, such as updating RMP table
      entries for SEV-SNP guests. As such, there's no input to the
      corresponding kvm_arch_gmem_prepare() hooks as to whether a page is
      being prepared/accessed as shared or as private, so "preparation" will
      end up being erroneously done on pages that were supposed to remain in a
      shared state. Rather than plumb through the additional information
      needed to distinguish between shared vs. private preparation, just
      continue to only do preparation on private pages, as was the case prior
      to support for GUEST_MEMFD_FLAG_MMAP being introduced.
  
      Signed-off-by: Michael Roth <michael.roth@amd.com>
  
  diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
  index 3acc6d983449..4869e59e4fc5 100644
  --- a/virt/kvm/guest_memfd.c
  +++ b/virt/kvm/guest_memfd.c
  @@ -1249,7 +1249,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
                  folio_mark_uptodate(folio);
          }
  
  -       r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
  +       if (!kvm_gmem_is_shared_mem(file_inode(file), index))
  +               r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
  
          folio_unlock(folio);

> 
> >
> >  	folio_unlock(folio);
> >
> > @@ -930,7 +933,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
> >  		folio_put(folio);
> >
> >  out:
> > -	filemap_invalidate_unlock_shared(file_inode(file)->i_mapping);
> > +	filemap_invalidate_unlock_shared(inode->i_mapping);
> >  	return r;
> >  }
> >  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_get_pfn);
> >
> > --
> > 2.53.0.1018.g2bb0e51243-goog

^ permalink raw reply

* Re: [PATCH V10 00/10] famfs: port into fuse
From: John Groves @ 2026-04-01 15:15 UTC (permalink / raw)
  To: John Groves
  Cc: Miklos Szeredi, Dan Williams, Bernd Schubert, Alison Schofield,
	John Groves, Jonathan Corbet, Shuah Khan, Vishal Verma,
	Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
	David Hildenbrand, Christian Brauner, Darrick J . Wong,
	Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
	Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
	Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
	Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
	Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
In-Reply-To: <0100019d43e5f632-f5862a3e-361c-4b54-a9a6-96c242a8f17a-000000@email.amazonses.com>

On 26/03/31 12:37PM, John Groves wrote:
> From: John Groves <john@groves.net>
> 
> NOTE: this series depends on the famfs dax series in Ira's for-7.1/dax-famfs
> branch [0]
> 
> Changes v9 -> v10
> - Rebased to Ira's for-7.1/dax-famfs branch [0], which contains the required
>   dax patches
> - Add parentheses to FUSE_IS_VIRTIO_DAX() macro, in case something bad is
>   passed in as fuse_inode (thanks Jonathan's AI)
> 
> Description:
> 
> This patch series introduces famfs into the fuse file system framework.
> Famfs depends on the bundled dax patch set.
> 
> The famfs user space code can be found at [1].
> 
> Fuse Overview:
> 
> Famfs started as a standalone file system, but this series is intended to
> permanently supersede that implementation. At a high level, famfs adds
> two new fuse server messages:
> 
> GET_FMAP   - Retrieves a famfs fmap (the file-to-dax map for a famfs
> 	     file)
> GET_DAXDEV - Retrieves the details of a particular daxdev that was
> 	     referenced by an fmap
> 
> Famfs Overview
> 
> Famfs exposes shared memory as a file system. Famfs consumes shared
> memory from dax devices, and provides memory-mappable files that map
> directly to the memory - no page cache involvement. Famfs differs from
> conventional file systems in fs-dax mode, in that it handles in-memory
> metadata in a sharable way (which begins with never caching dirty shared
> metadata).
> 
> Famfs started as a standalone file system [2,3], but the consensus at
> LSFMM was that it should be ported into fuse [4,5].
> 
> The key performance requirement is that famfs must resolve mapping faults
> without upcalls. This is achieved by fully caching the file-to-devdax
> metadata for all active files. This is done via two fuse client/server
> message/response pairs: GET_FMAP and GET_DAXDEV.
> 
> Famfs remains the first fs-dax file system that is backed by devdax
> rather than pmem in fs-dax mode (hence the need for the new dax mode).
> 
> Notes
> 
> - When a file is opened in a famfs mount, the OPEN is followed by a
>   GET_FMAP message and response. The "fmap" is the full file-to-dax
>   mapping, allowing the fuse/famfs kernel code to handle
>   read/write/fault without any upcalls.
> 
> - After each GET_FMAP, the fmap is checked for extents that reference
>   previously-unknown daxdevs. Each such occurrence is handled with a
>   GET_DAXDEV message and response.
> 
> - Daxdevs are stored in a table (which might become an xarray at some
>   point). When entries are added to the table, we acquire exclusive
>   access to the daxdev via the fs_dax_get() call (modeled after how
>   fs-dax handles this with pmem devices). Famfs provides
>   holder_operations to devdax, providing a notification path in the
>   event of memory errors or forced reconfiguration.
> 
> - If devdax notifies famfs of memory errors on a dax device, famfs
>   currently blocks all subsequent accesses to data on that device. The
>   recovery is to re-initialize the memory and file system. Famfs is
>   memory, not storage...
> 
> - Because famfs uses backing (devdax) devices, only privileged mounts are
>   supported (i.e. the fuse server requires CAP_SYS_RAWIO).
> 
> - The famfs kernel code never accesses the memory directly - it only
>   facilitates read, write and mmap on behalf of user processes, using
>   fmap metadata provided by its privileged fuse server. As such, the
>   RAS of the shared memory affects applications, but not the kernel.
> 
> - Famfs has backing device(s), but they are devdax (char) rather than
>   block. Right now there is no way to tell the vfs layer that famfs has a
>   char backing device (unless we say it's block, but it's not). Currently
>   we use the standard anonymous fuse fs_type - but I'm not sure that's
>   ultimately optimal (thoughts?)
> 
> Changes v8 -> v9
> - Kconfig: fs/fuse/Kconfig:CONFIG_FUSE_FAMFS_DAX now depends on the
>   new CONFIG_DEV_DAX_FSDEV (from drivers/dax/Kconfig) rather than
>   just CONFIG_DEV_DAX and CONFIG_FS_DAX. (CONFIG_FUSE_FAMFS_DAX
>   depends on those...)
> 
> Changes v7 -> v8
> - Moved to inline __free declaration in fuse_get_fmap() and
>   famfs_fuse_meta_alloc(), famfs_teardown()
> - Adopted FIELD_PREP() macro rather than manual bitfield manipulation
> - Minor doc edits
> - I dropped adding magic numbers to include/uapi/linux/magic.h. That
>   can be done later if appropriate
> 
> Changes v6 -> v7
> - Fixed a regression in famfs_interleave_fileofs_to_daxofs() that
>   was reported by Intel's kernel test robot
> - Added a check in __fsdev_dax_direct_access() for negative return
>   from pgoff_to_phys(), which would indicate an out-of-range offset
> - Fixed a bug in __famfs_meta_free(), where not all interleaved
>   extents were freed
> - Added chunksize alignment checks in famfs_fuse_meta_alloc() and
>   famfs_interleave_fileofs_to_daxofs() as interleaved chunks must
>   be PTE or PMD aligned
> - Simplified famfs_file_init_dax() a bit
> - Re-ran CM's kernel code review prompts on the entire series and
>   fixed several minor issues
> 
> Changes v4 -> v5 -> v6
> - None. Re-sending due to technical difficulties
> 
> Changes v3 [9] -> v4
> - The patch "dax: prevent driver unbind while filesystem holds device"
>   has been dropped. Dan Williams indicated that the favored behavior is
>   for a file system to stop working if an underlying driver is unbound,
>   rather than preventing the unbind.
> - The patch "famfs_fuse: Famfs mount opt: -o shadow=<shadowpath>" has
>   been dropped. Found a way for the famfs user space to do without the
>   -o opt (via getxattr).
> - Squashed the fs/fuse/Kconfig patch into the first subsequent patch
>   that needed the change
>   ("famfs_fuse: Basic fuse kernel ABI enablement for famfs")
> - Many review comments addressed.
> - Addressed minor kerneldoc infractions reported by test robot.
> 
> Changes v2 [7] -> v3
> - Dax: Completely new fsdev driver (drivers/dax/fsdev.c) replaces the
>   dev_dax_iomap modifications to bus.c/device.c. Devdax devices can now
>   be switched among 'devdax', 'famfs' and 'system-ram' modes via daxctl
>   or sysfs.
> - Dax: fsdev uses MEMORY_DEVICE_FS_DAX type and leaves folios at order-0
>   (no vmemmap_shift), allowing fs-dax to manage folio lifecycles
>   dynamically like pmem does.
> - Dax: The "poisoned page" problem is properly fixed via
>   fsdev_clear_folio_state(), which clears stale mapping/compound state
>   when fsdev binds. The temporary WARN_ON_ONCE workaround in fs/dax.c
>   has been removed.
> - Dax: Added dax_set_ops() so fsdev can set dax_operations at bind time
>   (and clear them on unbind), since the dax_device is created before we
>   know which driver will bind.
> - Dax: Added custom bind/unbind sysfs handlers; unbind return -EBUSY if a
>   filesystem holds the device, preventing unbind while famfs is mounted.
> - Fuse: Famfs mounts now require that the fuse server/daemon has
>   CAP_SYS_RAWIO because they expose raw memory devices.
> - Fuse: Added DAX address_space_operations with noop_dirty_folio since
>   famfs is memory-backed with no writeback required.
> - Rebased to latest kernels, fully compatible with Alistair Popple
>   et. al's recent dax refactoring.
> - Ran this series through Chris Mason's code review AI prompts to check
>   for issues - several subtle problems found and fixed.
> - Dropped RFC status - this version is intended to be mergeable.
> 
> Changes v1 [8] -> v2:
> 
> - The GET_FMAP message/response has been moved from LOOKUP to OPEN, as
>   was the pretty much unanimous consensus.
> - Made the response payload to GET_FMAP variable sized (patch 12)
> - Dodgy kerneldoc comments cleaned up or removed.
> - Fixed memory leak of fc->shadow in patch 11 (thanks Joanne)
> - Dropped many pr_debug and pr_notice calls
> 
> 
> References
> 
> [0] - https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/
> [1] - https://famfs.org (famfs user space)
> [2] - https://lore.kernel.org/linux-cxl/cover.1708709155.git.john@groves.net/
> [3] - https://lore.kernel.org/linux-cxl/cover.1714409084.git.john@groves.net/
> [4] - https://lwn.net/Articles/983105/ (lsfmm 2024)
> [5] - https://lwn.net/Articles/1020170/ (lsfmm 2025)
> [6] - https://lore.kernel.org/linux-cxl/cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.1740713401.git-series.apopple@nvidia.com/
> [7] - https://lore.kernel.org/linux-fsdevel/20250703185032.46568-1-john@groves.net/ (famfs fuse v2)
> [8] - https://lore.kernel.org/linux-fsdevel/20250421013346.32530-1-john@groves.net/ (famfs fuse v1)
> [9] - https://lore.kernel.org/linux-fsdevel/20260107153244.64703-1-john@groves.net/T/#mb2c868801be16eca82dab239a1d201628534aea7 (famfs fuse v3)
> 
> 
> John Groves (10):
>   famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/
>   famfs_fuse: Basic fuse kernel ABI enablement for famfs
>   famfs_fuse: Plumb the GET_FMAP message/response
>   famfs_fuse: Create files with famfs fmaps
>   famfs_fuse: GET_DAXDEV message and daxdev_table
>   famfs_fuse: Plumb dax iomap and fuse read/write/mmap
>   famfs_fuse: Add holder_operations for dax notify_failure()
>   famfs_fuse: Add DAX address_space_operations with noop_dirty_folio
>   famfs_fuse: Add famfs fmap metadata documentation
>   famfs_fuse: Add documentation
> 
>  Documentation/filesystems/famfs.rst |  142 ++++
>  Documentation/filesystems/index.rst |    1 +
>  MAINTAINERS                         |   10 +
>  fs/fuse/Kconfig                     |   13 +
>  fs/fuse/Makefile                    |    1 +
>  fs/fuse/dir.c                       |    2 +-
>  fs/fuse/famfs.c                     | 1180 +++++++++++++++++++++++++++
>  fs/fuse/famfs_kfmap.h               |  167 ++++
>  fs/fuse/file.c                      |   45 +-
>  fs/fuse/fuse_i.h                    |  116 ++-
>  fs/fuse/inode.c                     |   35 +-
>  fs/fuse/iomode.c                    |    2 +-
>  fs/namei.c                          |    1 +
>  include/uapi/linux/fuse.h           |   88 ++
>  14 files changed, 1790 insertions(+), 13 deletions(-)
>  create mode 100644 Documentation/filesystems/famfs.rst
>  create mode 100644 fs/fuse/famfs.c
>  create mode 100644 fs/fuse/famfs_kfmap.h
> 
> 
> base-commit: 2ae624d5a555d47a735fb3f4d850402859a4db77
> -- 
> 2.53.0
> 
> 

Miklos,

I would appreciate a read on what you're thinking WRT merging famfs. The
dax patches are ready; this series should be applied on top of Ira's 
for-7.1/dax-famfs branch, which is at [1].

I saw that you had the famfs series in your for-next branch briefly a
couple of weeks ago, but it didn't build because it depends on the dax
series. It will build and run cleanly if you put it on Ira's branch above.

Famfs has been in use for a long time, though availability of sharable cxl
memory is still limited; that is changing with early availability (now) of 
sharable JBOMs up to 100TB.

The presence of famfs won't affect anybody who doesn't use it though...

What are your thoughts?

Thanks,
John

[1] - https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/


^ permalink raw reply

* Re: [PATCH 28/33] docs: rust: quick-start: remove GDB/Binutils mention
From: Gary Guo @ 2026-04-01 15:15 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-29-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> The versions provided nowadays by even a distribution like Debian Stable
> (and Debian Old Stable) are newer than those mentioned [1].
>
> Thus remove the workaround.
>
> Note that the minimum binutils version in the kernel is still 2.30, so
> one could argue part of the note is still relevant, but it is unlikely
> a kernel developer using such an old binutils is enabling Rust on a
> modern kernel, especially when using distribution toolchains, e.g. the
> Rust minimum version is not satisfied by Debian Old Stable.

I suppose people could have been using an old LTS distro + rustup and run into
this issue. Albeit it's probably quite unlikely.

Reviewed-by: Gary Guo <gary@garyguo.net>

>
> So we are at the point where keeping the docs short and relevant for
> essentially everyone is probably the better trade-off.
>
> Link: https://packages.debian.org/search?suite=all&searchon=names&keywords=binutils [1]
> Link: https://lore.kernel.org/all/CANiq72mCpc9=2TN_zC4NeDMpFQtPXAFvyiP+gRApg2vzspPWmw@mail.gmail.com/
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
> ---
>  Documentation/rust/quick-start.rst | 9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/Documentation/rust/quick-start.rst b/Documentation/rust/quick-start.rst
> index 5bbe059a8fa3..a6ec3fa94d33 100644
> --- a/Documentation/rust/quick-start.rst
> +++ b/Documentation/rust/quick-start.rst
> @@ -352,12 +352,3 @@ Hacking
>  To dive deeper, take a look at the source code of the samples
>  at ``samples/rust/``, the Rust support code under ``rust/`` and
>  the ``Rust hacking`` menu under ``Kernel hacking``.
> -
> -If GDB/Binutils is used and Rust symbols are not getting demangled, the reason
> -is the toolchain does not support Rust's new v0 mangling scheme yet.
> -There are a few ways out:
> -
> -- Install a newer release (GDB >= 10.2, Binutils >= 2.36).
> -
> -- Some versions of GDB (e.g. vanilla GDB 10.1) are able to use
> -  the pre-demangled names embedded in the debug info (``CONFIG_DEBUG_INFO``).


^ permalink raw reply

* Re: [PATCH 27/33] docs: rust: quick-start: remove Nix "unstable channel" note
From: Gary Guo @ 2026-04-01 15:10 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-28-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> Nix does not need the "unstable channel" note, since its packages are
> recent enough even in the stable channel [1][2].
> 
> Thus remove it to simplify the documentation.
> 
> Link: https://search.nixos.org/packages?channel=25.11&query=rust [1]
> Link: https://search.nixos.org/packages?channel=25.11&query=bindgen [2]
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  Documentation/rust/quick-start.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)


^ permalink raw reply

* Re: [PATCH 21/33] gpu: nova-core: bindings: remove unneeded `cfg_attr`
From: Gary Guo @ 2026-04-01 15:08 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-22-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> These were likely copied from the `bindings` and `uapi` crates, but are
> unneeded since there are no `cfg(test)`s in the bindings.
> 
> In addition, the issue that triggered the addition in those crates
> originally is also fixed in `bindgen` (please see the previous commit).
> 
> Thus remove them.
> 
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  drivers/gpu/nova-core/gsp/fw/r570_144.rs | 3 ---

I believe drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs is generated
manually and checked in to the source tree? It might be useful to actually have
the command line and bindgen version used to generate the file present
somewhere for knowing about possible compatibility issues like this one and
reproducibility..

Best,
Gary

>  1 file changed, 3 deletions(-)


^ permalink raw reply

* Re: [PATCH 20/33] rust: kbuild: remove unneeded old `allow`s for generated layout tests
From: Gary Guo @ 2026-04-01 15:05 UTC (permalink / raw)
  To: Miguel Ojeda, Nathan Chancellor, Nicolas Schier, Danilo Krummrich,
	Andreas Hindborg, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Alexandre Courbot, David Airlie,
	Simona Vetter, Brendan Higgins, David Gow, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas, Alice Ryhl, Jonathan Corbet
  Cc: Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin,
	Trevor Gross, rust-for-linux, linux-kbuild, Lorenzo Stoakes,
	Vlastimil Babka, Liam R . Howlett, Uladzislau Rezki, linux-block,
	moderated for non-subscribers, Alexandre Ghiti, linux-riscv,
	nouveau, dri-devel, Rae Moar, linux-kselftest, kunit-dev,
	Nick Desaulniers, Bill Wendling, Justin Stitt, llvm, linux-kernel,
	Shuah Khan, linux-doc
In-Reply-To: <20260401114540.30108-21-ojeda@kernel.org>

On Wed Apr 1, 2026 at 12:45 PM BST, Miguel Ojeda wrote:
> The issue that required `allow`s for `cfg(test)` code generated by
> `bindgen` for layout testing was fixed back in `bindgen` 0.60.0 [1],
> so it could have been removed even before the version bump, but it does
> not hurt.
> 
> Thus remove it now.
> 
> Link: https://github.com/rust-lang/rust-bindgen/pull/2203 [1]
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/bindings/lib.rs | 4 ----
>  rust/uapi/lib.rs     | 4 ----
>  2 files changed, 8 deletions(-)


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox