Linux Documentation
 help / color / mirror / Atom feed
* Re: (subset) [PATCH v7 00/10] Support for Samsung S2MU005 PMIC and its sub-devices
From: Lee Jones @ 2026-05-21 16:05 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	Krzysztof Kozlowski, André Draszik, Alexandre Belloni,
	Jonathan Corbet, Shuah Khan, Nam Tran,
	Łukasz Lebiedziński, Yassine Oudjana,
	Kaustabh Chakraborty
  Cc: linux-leds, devicetree, linux-kernel, linux-pm, linux-samsung-soc,
	linux-rtc, linux-doc, Conor Dooley, Krzysztof Kozlowski
In-Reply-To: <20260516-s2mu005-pmic-v7-0-73f9702fb461@disroot.org>

On Sat, 16 May 2026 03:08:32 +0530, Kaustabh Chakraborty wrote:
> S2MU005 is an MFD chip manufactured by Samsung Electronics. This is
> found in various devices manufactured by Samsung and others, including
> all Exynos 7870 devices. It is known to have the following features:
> 
> 1. Two LED channels with adjustable brightness for use as a torch, or a
>    flash strobe.
> 2. An RGB LED with 8-bit channels. Usually programmed as a notification
>    indicator.
> 3. An MUIC, which works with USB micro-B (and USB-C?). For the micro-B
>    variant though, it measures the ID-GND resistance using an internal
>    ADC.
> 4. A charger device, which reports if charger is online, voltage,
>    resistance, etc.
> 
> [...]

Applied, thanks!

[01/10] dt-bindings: leds: document Samsung S2M series PMIC flash LED device
        commit: a794673949f1aa1dd948ce3ea436af48ea83d7b2
[06/10] leds: flash: add support for Samsung S2M series PMIC flash LED device
        commit: f0878c58430c378c47aaece1b29484e4ae8d7faf
[07/10] leds: rgb: add support for Samsung S2M series PMIC RGB LED device
        commit: 366ed7a6d22e682e6dfd4d64d8f543bc70c6b58e
[08/10] Documentation: leds: document pattern behavior of Samsung S2M series PMIC RGB LEDs
        commit: 1795fd2dbe84ef4d393b69a0b2a3b371f810bde5

--
Lee Jones [李琼斯]


^ permalink raw reply

* Re: (subset) [PATCH v7 00/10] Support for Samsung S2MU005 PMIC and its sub-devices
From: Lee Jones @ 2026-05-21 16:01 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	Krzysztof Kozlowski, André Draszik, Alexandre Belloni,
	Jonathan Corbet, Shuah Khan, Nam Tran,
	Łukasz Lebiedziński, Yassine Oudjana,
	Kaustabh Chakraborty
  Cc: linux-leds, devicetree, linux-kernel, linux-pm, linux-samsung-soc,
	linux-rtc, linux-doc, Conor Dooley, Krzysztof Kozlowski
In-Reply-To: <20260516-s2mu005-pmic-v7-0-73f9702fb461@disroot.org>

On Sat, 16 May 2026 03:08:32 +0530, Kaustabh Chakraborty wrote:
> S2MU005 is an MFD chip manufactured by Samsung Electronics. This is
> found in various devices manufactured by Samsung and others, including
> all Exynos 7870 devices. It is known to have the following features:
> 
> 1. Two LED channels with adjustable brightness for use as a torch, or a
>    flash strobe.
> 2. An RGB LED with 8-bit channels. Usually programmed as a notification
>    indicator.
> 3. An MUIC, which works with USB micro-B (and USB-C?). For the micro-B
>    variant though, it measures the ID-GND resistance using an internal
>    ADC.
> 4. A charger device, which reports if charger is online, voltage,
>    resistance, etc.
> 
> [...]

Applied, thanks!

[03/10] dt-bindings: mfd: add documentation for S2MU005 PMIC
        commit: 12479cc3750c6b741b6d87392e393d959cf2f013
[04/10] mfd: sec: add support for S2MU005 PMIC
        commit: aeff14ae7271cc3070312f894de9a4e075855d31
[05/10] mfd: sec: set DMA coherent mask
        commit: ba1f536070abd595a141c683f617eed3c6e42297

--
Lee Jones [李琼斯]


^ permalink raw reply

* Re: [PATCH] nios2: remove the architecture
From: Simon Schuster @ 2026-05-21 15:37 UTC (permalink / raw)
  To: Arnd Bergmann, Dinh Nguyen, Wolfram Sang, Miguel Ojeda
  Cc: Ethan Nelson-Moore, Peter Zijlstra, linux-doc, devicetree,
	workflows, Linux-Arch, dmaengine, linux-i2c, linux-iio, Netdev,
	linux-pci, linux-pwm, linux-hardening, linux-kbuild,
	linux-csky@vger.kernel.org, Jonathan Corbet, Shuah Khan,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Daniel Lezcano,
	Thomas Gleixner, Alex Shi, Yanteng Si, Dongliang Mu, Hu Haowen,
	Kees Cook, Oleg Nesterov, Will Deacon, Aneesh Kumar K.V (Arm),
	Andrew Morton, Nicholas Piggin, Vinod Koul, Frank Li,
	Dave Penkler, Andi Shyti, Jonathan Cameron, David Lechner,
	Nuno Sá, Andy Shevchenko, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Pieralisi,
	Krzysztof WilczyDski, Andreas Oetken
In-Reply-To: <76af64fa-7820-4d92-8aa9-826c3bd812a1@app.fastmail.com>

Hi Arnd, Dinh, Wolfram, and Miguel,

thank you for your explanations and encouragement; I've now sent my
application for co-maintainership for arch/nios2 to you, Dinh.

On Wed, May 20, 2026 at 09:06:33AM +0200, Arnd Bergmann wrote:
> I think that is a reasonable target. We have a bunch of embedded
> architectures that have a similarly small user base and I expect
> that we will want to remove most of them at some point, as we did
> for seven architectures in linux-4.17.
> 
> As long as there is a maintainer for nios2 and it's not actively
> getting in the way of a specific treewide change, I don't see any
> reason to remove this any earlier than the other ones.
> 
> Obviously at some point nios2 will have to get removed because
> of the limit to gcc-14 or older, but that should not be a problem
> for the next few LTS releases.

This all sounds quite reasonable, including the toolchain
considerations. Thank you for the offer to keep it around a bit.
If any issues arise with tree-wide changes I'd be happy to look into
what can be done on the arch/nios2 side; now that the issues should
reliably reach me via mail.

> > Sure, I'd be glad to do so, but so far I refrained from it as I was a bit
> > unsure about the netiquette (can I simply do so by self-proclamation? At
> > least the git history seems to suggest so...).
> 
> Dinh already replied that he welcomes the help, and I also suggested
> the same thing a year ago. As the only known user that has contributed
> patches in a long time, you are obviously qualified.
> 
> Sending a patch for the MAINTAINERS file to Dinh is the first step,
> once he has sent that upstream, you can (optionally) apply for
> kernel.org account that would let you host a git tree on kernel.org
> or have a tree that you both have access to.

I've sent the patch, I'm sure we can work everything else out from
there.

Best regards,
Simon

^ permalink raw reply

* Re: [PATCH v3] killswitch: add per-function short-circuit mitigation primitive
From: Sasha Levin @ 2026-05-21 15:31 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Song Liu, linux-kernel, linux-doc, linux-kselftest, bpf,
	live-patching, Greg Kroah-Hartman, Andrew Morton, Jonathan Corbet,
	Mathieu Desnoyers, Joshua Peisach, Florian Weimer, Breno Leitao,
	Anthony Iliopoulos, Michal Hocko, Jiri Olsa, John Fastabend,
	Christian Brauner, KP Singh
In-Reply-To: <3dd6d852-18fb-4c64-a1ae-0d79ef7c061f@iogearbox.net>

On Thu, May 21, 2026 at 11:11:16AM +0200, Daniel Borkmann wrote:
>On 5/19/26 9:57 PM, Sasha Levin wrote:
>>Sure, this would also work. How do you see this happening? Can we let a certain
>>user/pid/etc disable the allowlist if they choose to?
>
>I don't think we should, given then we're back to square one where root
>or some other user would be able to just override/bypass an LSM.

killswitch already disables itself when lockdown is active. We can easily
disable it too when one of the LSMs that cares about this is active.

>[...]
>>How do you see this working with the allowlist?
>
>We should look at the underlying areas where most of the CVE-like fixes
>took place (these days should be more easily doable given Claude and friends)
>and based on that either extend ALLOW_ERROR_INJECTION() or (better) create
>new hooks which BPF LSM can consume where you can then have a policy to reject
>requests and tighten the attack surface. For example, the AF_ALG stuff you

So we could grow the LSM tentacles deeper into the kernel, and we can see where
current CVEs are happening, which I suspect is the darker corners of the kernel
(old unmaintained, rarely used code), but this definitely won't stay the case,
right? Newer and better LLMs will discover issues elsewhere, and once the low
hanging fruits are picked off of the current target subsystems, researchers
will move elsewhere. We will be dooming ourselves to an endless cat and mouse
game where we go add LSM hooks after some big security issue goes public.

One question I had here: how would we tackle security issues with BPF itself?

>can already easily cover today ...
>
>#include "vmlinux.h"
>#include <bpf/bpf_helpers.h>
>#include <bpf/bpf_tracing.h>
>
>#define AF_ALG	38
>#define EPERM	1
>
>char _license[] SEC("license") = "Dual BSD/GPL";
>
>SEC("lsm/socket_create")
>int BPF_PROG(block_af_alg, int family, int type, int protocol, int kern)
>{
>	if (family == AF_ALG)
>		return -EPERM;
>	return 0;
>}
>
>... the problem is that distros enable and pull in all sort of crap which
>then non-root could pull in via request_module() as an example; similarly
>for netlink we want to have a BPF LSM policy to parse into netlink requests
>and then reject based on certain attribute matching (both on our todo list)
>which would have helped in case of exotic tc cls/act/qdisc modules to prevent
>them to be pulled from userns. I bet there are a ton more examples once we
>look further into the data.

I definitely agree that BPF is a much nicer hammer than the simple killswitch
implementation. I've actually been (privately) playing with an out of tree
killswitch that also supports BPF. I've pushed the (hacky) code I have to
https://github.com/sashalevin/killswitch , and you can see an example of a BPF
mitigation similar to the one you have above:

https://github.com/sashalevin/killswitch/blob/master/mitigations/cve-2025-21703.sh

My concern is mostly with the whitelist approach.

-- 
Thanks,
Sasha

^ permalink raw reply

* RE: [PATCH net-next 2/5] net: ethernet: oa_tc6: Allow custom mii_bus
From: Regus, Ciprian @ 2026-05-21 15:26 UTC (permalink / raw)
  To: Andrew Lunn, Selvamani Rajagopal
  Cc: Parthiban Veerasooran, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, Heiner Kallweit, Russell King, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	devicetree@vger.kernel.org
In-Reply-To: <77df32ed-3e22-4e9b-941b-3046de25b88f@lunn.ch>



> > > This all seems pretty invasive and ugly. Please could you think what
> > > happens if instead of passing in an mdiobus, you pass a phydev. Is the
> > > change to the core simpler and cleaner?
> > >
> > > Andrew
> >
> 
> > Kind of agree. Initially we were thinking about changing the
> > existing code (Microchip's vendor code) to alloc mii_bus so that
> > code would be same across multiple vendors. Either way, it would be
> > invasive changes. So, we decide to go with minimal change to other
> > vendor's code.
> 
> That would be wrong. The standard defines this, so it should be in the
> core. Anything which the standard defines should be in the core, so
> that drivers for hardware which actually follow the standard are
> minimal. Also, we try to keep workarounds for broken hardware out of
> the core, hide it in the driver. That is not always possible, but the
> aim should be to make the core clean. We don't want to penalise
> vendors which got the implementation correct because of vendors who
> got is wrong.
> 
> > Trying to understand your suggestion. Are you suggesting to move
> > entire mii_bus allocation/APIs implementation to vendor side and
> > keep only phy dev usage in oa_tc6.c?
> 
> No. I'm thinking maybe extend oa_tc6_init, similar to what you
> did. Add a quirks flag, and define TC6_QUIRK_BROKEN_PHY. And allow a
> phydev to be passed as well.
> 
> If the quirk is set, don't call oa_tc6_mdiobus_register() or
> phy_find_first(), nor oa_tc6_mdiobus_unregister().
>

The issue I can see with this approach is that we should have already registered
the mii_bus and read a valid PHY id from the device, before passing the phy_device to
to oa_tc_init(). Scanning the mdio bus requires OA TC6 SPI transfers (reading registers 0xFF02
and 0xFF03), while oa_tc6 has not yet initialized. For the ADIN1140 driver this is not an issue,
because we can return cached values for the PHY id, as you suggested. However, that limits
the usefulness of the BROKEN_PHY flag, because every new driver that cannot use the default
init sequence in oa_tc6 (and wants to set the BROKEN_PHY flag) has to fit this specific case.

I think the approach which involves the least amount of changes in the core would be for oa_tc6
to skip the oa_tc6_phy_init() and oa_tc6_phy_exit() if the BROKEN_PHY quirk flag is set and
leave it to the drivers using oa_tc6 to handle the mii_bus alloc/register/unregister/free and
phy_connect()/disconnect().

These would be the only changes in the core's phy handling path (besides adding the flag itself):

@@ -585,10 +586,13 @@ static int oa_tc6_phy_init(struct oa_tc6 *tc6)
 {
        int ret;
 
+       if (tc6->quirk_flags & OA_TC6_BROKEN_PHY)
+               return 0;
+
        ret = oa_tc6_check_phy_reg_direct_access_capability(tc6);
        if (ret) {
                netdev_err(tc6->netdev,
                          "Direct PHY register access is not supported by the MAC-PHY\n");
                return ret;
        }
...
}

static void oa_tc6_phy_exit(struct oa_tc6 *tc6)
{
+       if (tc6->quirk_flags & OA_TC6_BROKEN_PHY)
+               return;
+
        phy_disconnect(tc6->phydev);
        oa_tc6_mdiobus_unregister(tc6);
}

> You probably want to start with a patch which breaks oa_tc6_phy_init()
> into two, since you still need the phy_connect_direct() and
> phy_attached_info(). Then add the quirk, and lastly your driver making
> use of the quirk.
> 
> The quirks flag could also be used for devices which have MMD 30
> mapped into a vendor reserved MMS.
> 
> 	Andrew


^ permalink raw reply

* Re: [PATCH v5 09/13] ima: Add support for staging measurements with prompt
From: Mimi Zohar @ 2026-05-21 15:18 UTC (permalink / raw)
  To: Roberto Sassu, corbet, skhan, dmitry.kasatkin, eric.snowberg,
	paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, chenste, nramas, Roberto Sassu
In-Reply-To: <20260429160319.4162918-10-roberto.sassu@huaweicloud.com>

On Wed, 2026-04-29 at 18:03 +0200, Roberto Sassu wrote:
> From: Roberto Sassu <roberto.sassu@huawei.com>
> 
> Introduce the ability of staging the IMA measurement list and deleting them
> with a prompt.
> 
> Staging means moving the current content of the measurement list to a

-> moving the current measurement list records ...

> separate location, and allowing users to read and delete it. This causes
> the measurement list to be atomically truncated before new measurements can
> be added. 

The wording is a bit off - "before new measurements can be added".  One of the
main objectives of staging the measurement list is to allow new measurement
records to continue to be added to the measurement list, while the staged
measurements are exported.

> Staging can be done only once at a time. In the event of kexec(),
> staging is reverted and staged entries will be carried over to the new
> kernel.
> 
> Introduce ascii_runtime_measurements_<algo>_staged and
> binary_runtime_measurements_<algo>_staged interfaces to access and delete
> the measurements. Also, add write permission to the original measurement
> interfaces.

Wondering if adding "write" permission to the original measurement interface
will change based on your 9/13 comment.

The patch, like others in this patch set, are well written.  There are a couple
of inline comments.  I'll defer reviewing the rest of this patch to v6.

> 
> Use 'echo A > <IMA original interface>' and
> 'echo D > <IMA _staged interface>' to respectively stage and delete the
> entire measurements list. Locking of these interfaces is also mediated with
> a call to _ima_measurements_open() and with ima_measurements_release().
> 
> Implement the staging functionality by introducing the new global
> measurements list ima_measurements_staged, and ima_queue_stage() and
> ima_queue_staged_delete_all() to respectively move measurements from the
> current measurements list to the staged one, and to move staged
> measurements to the ima_measurements_trim list for deletion. Introduce
> ima_queue_delete() to delete the measurements.
> 
> Finally, introduce the BINARY_STAGED and BINARY_FULL binary measurements
> list types, to maintain the counters and the binary size of staged
> measurements and the full measurements list (including entries that were
> staged). BINARY still represents the current binary measurements list.
> 
> Use the binary size for the BINARY + BINARY_STAGED types in
> ima_add_kexec_buffer(), since both measurements list types are copied to
> the secondary kernel during kexec. Use BINARY_FULL in
> ima_measure_kexec_event(), to generate a critical data record.
> 
> It should be noted that the BINARY_FULL counter is not passed through
> kexec. Thus, the number of entries included in the kexec critical data
> records refers to the entries since the previous kexec records.
> 
> Note: This code derives from the Alt-IMA Huawei project, whose license is
>       GPL-2.0 OR MIT.
> 
> Link: https://github.com/linux-integrity/linux/issues/1
> Suggested-by: Gregory Lumen <gregorylumen@linux.microsoft.com> (staging revert)
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
>  security/integrity/ima/Kconfig     |  13 +++
>  security/integrity/ima/ima.h       |   8 +-
>  security/integrity/ima/ima_fs.c    | 181 ++++++++++++++++++++++++++---
>  security/integrity/ima/ima_kexec.c |  24 +++-
>  security/integrity/ima/ima_queue.c |  97 +++++++++++++++-
>  5 files changed, 302 insertions(+), 21 deletions(-)
> 
> diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
> index 862fbee2b174..48c906793efb 100644
> --- a/security/integrity/ima/Kconfig
> +++ b/security/integrity/ima/Kconfig
> @@ -332,4 +332,17 @@ config IMA_KEXEC_EXTRA_MEMORY_KB
>  	  If set to the default value of 0, an extra half page of memory for those
>  	  additional measurements will be allocated.
>  
> +config IMA_STAGING
> +	bool "Support for staging the measurements list"
> +	default y

Exporting and deleting the IMA measurement list carries an inherent security
risk: if the measurements are not durably stored before deletion, they are
permanently lost. Deletion should be treated as experimental until a trusted
service exists to guarantee safe storage.

Please change the default to 'n'.


> +	help
> +	  Add support for staging the measurements list.
> +
> +	  It allows user space to stage the measurements list for deletion and
> +	  to delete the staged measurements after confirmation.
> +
> +	  On kexec, staging is reverted and staged measurements are prepended

-> staging is aborted and any staged measurement records are .copied ..

> +	  to the current measurements list when measurements are copied to the
> +	  secondary kernel.
> +
>  endif

Mimi

^ permalink raw reply

* Re: [PATCH v10 19/30] KVM: arm64: Provide assembly for SME register access
From: Mark Brown @ 2026-05-21 15:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Oliver Upton, Marc Zyngier, Joey Gouly, Catalin Marinas,
	Suzuki K Poulose, Will Deacon, Paolo Bonzini, Jonathan Corbet,
	Shuah Khan, Dave Martin, Fuad Tabba, Ben Horgan, linux-arm-kernel,
	kvmarm, linux-kernel, kvm, linux-doc, linux-kselftest,
	Peter Maydell, Eric Auger
In-Reply-To: <ag8b7oq4SFpdmlP_@J2N7QTR9R3>

[-- Attachment #1: Type: text/plain, Size: 799 bytes --]

On Thu, May 21, 2026 at 03:51:26PM +0100, Mark Rutland wrote:

> While this specific instance is simple enough, I don't think we should
> continue to duplicate the low level save/restore routines between the
> main kernel and KVM hyp code.

> I've sent a series that avoids the need for this, and cleans up some
> other bits):

>   https://lore.kernel.org/linux-arm-kernel/20260521132556.584676-1-mark.rutland@arm.com/

> Assuming Marc and Oliver are on board, I'd prefer that we do that
> cleanup first, and build the KVM SME support atop.

Yeah, I've got a laundry list of things that I want to improve with both
the main kernel and KVM but the latency on getting anything reviewed
with both sides and sometimes obscure implementation decisions means
I've been waiting until this is landed first.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v7 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE
From: Breno Leitao @ 2026-05-21 15:09 UTC (permalink / raw)
  To: Lance Yang
  Cc: linmiaohe, akpm, david, ljs, vbabka, rppt, surenb, mhocko, shuah,
	nao.horiguchi, rostedt, mhiramat, mathieu.desnoyers, corbet,
	skhan, liam, linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team
In-Reply-To: <f76b79d3-080a-4931-873e-99d4b3e1020f@linux.dev>

On Sat, May 16, 2026 at 12:06:14PM +0800, Lance Yang wrote:
> 
> 
> On 2026/5/15 21:13, Breno Leitao wrote:
> [...]
> > > 
> > > Wonder if it would be simpler to just do a positive check near the top
> > > of get_any_page() instead. Something like:
> > > 
> > > static bool hwpoison_unrecoverable_kernel_page(struct page *page,
> > > 						unsigned long flags)
> > 
> > Ack. We probably want to call it something like HWPoisonKernelOwned() to
> > follow the same naming sematics of these helpers, such as HWPoisonHandlable()
> > 
> > By the way, I will re-include the self test back to this patch series,
> > In case they are not useful, we do not merge it.
> > 
> 
> Sounds good :)
> 
> Can you also test the relevant page types if possible, especially
> the ones the new helper is supposed to classify?

Ack. I will expand the test to cover different page types as well!

Thanks,
--breno

^ permalink raw reply

* Re: [PATCH mm-unstable v17 04/14] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support
From: Lorenzo Stoakes @ 2026-05-21 14:59 UTC (permalink / raw)
  To: Nico Pache
  Cc: David Hildenbrand (Arm), Wei Yang, Lance Yang, linux-doc,
	linux-kernel, linux-mm, linux-trace-kernel, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	liam, mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
	pfalcato, rakie.kim, raquini, rdunlap, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <CAA1CXcCNT51jeXh6Kwg1QN9e+AJB-1hg21kmeY6fTTKr2GACug@mail.gmail.com>

On Tue, May 19, 2026 at 01:05:13PM -0600, Nico Pache wrote:
> On Mon, May 18, 2026 at 1:33 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
> >
> > On Mon, May 18, 2026 at 03:16:11PM +0200, David Hildenbrand (Arm) wrote:
> > > > For me, I would vote for fallback to 0.
> > >
> > > At this point I'll prefer to not return errors from collapse_max_ptes_none().
> > > It's just rather awkward to return an error deep down in collapse code for a
> > > configuration problem.
> > >
> > > For mthp collapse, we only support max_ptes_none==0 and
> > > max_ptes_none=="HPAGE_PMD_NR - 1" (default).
> > >
> > > If another value is specified while collapsing mTHP, print a warning and treat
> > > it as 0 (save value, no creep, no memory waste).
> > >
> > > In a sense, this is similar to how we handle max_ptes_shared + max_ptes_swap:
> > > for mTHP: we always treat them as being 0 for mTHP collapse (and don't issue a
> > > warning, because we would issue a warning with the default settings).
> > >
> > > @Lorenzo, fine with you?
> >
> > Yes 100%, this sounds sensible both in terms of the error and the default. Let's
> > keep our lives simple(-ish) please :)
>
> Ok thank you im glad we finally came to consensus on this! phew!
>

It happens sometimes ;)

Cheers, Lorenzo

^ permalink raw reply

* Re: [PATCH v14 00/28] Add new general DRM property "color format"
From: Daniel Stone @ 2026-05-21 14:53 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
	Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
	Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
	Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
	Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
	linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
	intel-xe, linux-doc, wayland-devel, Werner Sembach,
	Andri Yngvason, Cristian Ciocaltea, Marius Vlad, Dmitry Baryshkov,
	Andy Yan
In-Reply-To: <20260423-color-format-v14-0-449a419ccbd4@collabora.com>

Hi there,

On Thu, 23 Apr 2026 at 20:04, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
> We have an implementation in Weston at
> https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1825 that
> adds support for this property. This patch series has been tested
> against that MR on i915 (HDMI, DP), amdgpu (HDMI, DP) and on rockchip
> (HDMI).

This MR is R-b me.

> General notes on the approach taken by me: instead of silently switching
> to a different format than was explicitly requested, or even worse,
> outputting something to the sink the sink doesn't support, bubble up an
> error to userspace instead. "color format" is a "I want this" type
> property, not a "force this" type property, i.e. the kernel will respect
> the limits imposed by the hardware.

Yes! If userspace wants a fallback chain, it should encode it itself
through a series of test commits, rather than adding the sequential
logic to the kernel. Doing that might work for one axis, but pretty
quickly disintegrates when there are multiple parameters to perhaps
fall back on.

I had minor comments on 03 and 20, but they're Rb me with the obvious
fixes. 11, 12, and 19 are Acked-by me, as I don't quite know the
hardware specifics well enough to say. The rest are Reviewed-by me.

I suggest you merge the common code and VOP2/DW-QP implementations via
drm-misc, leaving Intel and AMD to merge through their own trees
whenever they're ready. We'll merge the Weston implementation when it
lands in DRM.

Thanks to you and all prior cooks for all the work, and to Maxime and
Dmitry for the help and review as well.

Cheers,
Daniel

^ permalink raw reply

* Re: [PATCH v10 19/30] KVM: arm64: Provide assembly for SME register access
From: Mark Rutland @ 2026-05-21 14:51 UTC (permalink / raw)
  To: Mark Brown, Oliver Upton, Marc Zyngier
  Cc: Joey Gouly, Catalin Marinas, Suzuki K Poulose, Will Deacon,
	Paolo Bonzini, Jonathan Corbet, Shuah Khan, Dave Martin,
	Fuad Tabba, Ben Horgan, linux-arm-kernel, kvmarm, linux-kernel,
	kvm, linux-doc, linux-kselftest, Peter Maydell, Eric Auger
In-Reply-To: <20260306-kvm-arm64-sme-v10-19-43f7683a0fb7@kernel.org>

On Fri, Mar 06, 2026 at 05:01:11PM +0000, Mark Brown wrote:
> Provide versions of the SME state save and restore functions for the
> hypervisor to allow it to restore ZA and ZT for guests.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_hyp.h |  2 ++
>  arch/arm64/kvm/hyp/fpsimd.S      | 23 +++++++++++++++++++++++
>  2 files changed, 25 insertions(+)

While this specific instance is simple enough, I don't think we should
continue to duplicate the low level save/restore routines between the
main kernel and KVM hyp code.

I've sent a series that avoids the need for this, and cleans up some
other bits):

  https://lore.kernel.org/linux-arm-kernel/20260521132556.584676-1-mark.rutland@arm.com/

Assuming Marc and Oliver are on board, I'd prefer that we do that
cleanup first, and build the KVM SME support atop.

Mark.

> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 0317790dd3b7..9b1354d1122c 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -116,6 +116,8 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
>  void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>  void __sve_save_state(void *sve_pffr, u32 *fpsr, int save_ffr);
>  void __sve_restore_state(void *sve_pffr, u32 *fpsr, int restore_ffr);
> +void __sme_save_state(void const *state, bool save_zt);
> +void __sme_restore_state(void const *state, bool restore_zt);
>  
>  u64 __guest_enter(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
> index 6e16cbfc5df2..18b7a666016c 100644
> --- a/arch/arm64/kvm/hyp/fpsimd.S
> +++ b/arch/arm64/kvm/hyp/fpsimd.S
> @@ -29,3 +29,26 @@ SYM_FUNC_START(__sve_save_state)
>  	sve_save 0, x1, x2, 3
>  	ret
>  SYM_FUNC_END(__sve_save_state)
> +
> +SYM_FUNC_START(__sme_save_state)
> +	// Caller needs to ensure SMCR updates are visible
> +	_sme_rdsvl	2, 1		// x2 = VL/8
> +	sme_save_za 0, x2, 12		// Leaves x0 pointing to the end of ZA
> +
> +	cbz	x1, 1f
> +	_str_zt 0
> +1:
> +	ret
> +SYM_FUNC_END(__sme_save_state)
> +
> +SYM_FUNC_START(__sme_restore_state)
> +	// Caller needs to ensure SMCR updates are visible
> +	_sme_rdsvl	2, 1		// x2 = VL/8
> +	sme_load_za	0, x2, 12	// Leaves x0 pointing to end of ZA
> +
> +	cbz	x1, 1f
> +	_ldr_zt 0
> +
> +1:
> +	ret
> +SYM_FUNC_END(__sme_restore_state)
> 
> -- 
> 2.47.3
> 

^ permalink raw reply

* Re: [PATCH v7 1/2] usb: xhci-pci: add AMD Promontory 21 PCI glue
From: Greg Kroah-Hartman @ 2026-05-21 14:47 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jihong Min, Mathias Nyman, Jonathan Corbet, Shuah Khan,
	Mario Limonciello, Basavaraj Natikar, Michal Pecio,
	Mario Limonciello, Yaroslav Isakov, linux-usb, linux-hwmon,
	linux-doc, linux-pci, linux-kernel
In-Reply-To: <06236462-6c4f-413d-8324-537fb8f743d9@roeck-us.net>

On Wed, May 20, 2026 at 07:18:49AM -0700, Guenter Roeck wrote:
> On Tue, May 19, 2026 at 09:07:31AM +0900, Jihong Min wrote:
> > AMD Promontory 21 (PROM21) xHCI PCI functions use the common xhci-pci
> > core for USB operation, but also expose controller-specific sensor data.
> > Add a small PROM21 PCI glue driver for AMD 1022:43fc and 1022:43fd
> > controllers.
> > 
> > The glue delegates USB host operation to the common xhci-pci core and
> > publishes a "hwmon" auxiliary device with parent-provided MMIO data.
> > Auxiliary device creation failure is logged but does not fail the xHCI
> > probe.
> > 
> > Make the PROM21 glue a hidden Kconfig tristate driven by the user-visible
> > SENSORS_PROM21_XHCI option. If sensor support is disabled, generic
> > xhci-pci binds PROM21 controllers normally. If sensor support is enabled,
> > the glue follows USB_XHCI_PCI.
> > 
> > This keeps the auxiliary device available for a modular sensor driver while
> > avoiding a built-in xhci-pci core handing PROM21 controllers to a glue
> > driver that is only available as a module during initramfs.
> > 
> > Assisted-by: Codex:gpt-5.5
> > Signed-off-by: Jihong Min <hurryman2212@gmail.com>
> > Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
> > Tested-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
> 
> Acked-by: Guenter Roeck <linux@roeck-us.net>
> 
> The two patches should be applied together. For now I will assume that
> they will both be applied through a usb tree since this patch touches
> common usb code.

Sounds good, I'll go take it now, thanks.

greg k-h

^ permalink raw reply

* Re: [PATCH v14 20/28] drm/rockchip: dw_hdmi_qp: Implement "color format" DRM property
From: Daniel Stone @ 2026-05-21 14:40 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
	Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
	Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
	Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
	Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
	linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
	intel-xe, linux-doc, wayland-devel, Cristian Ciocaltea
In-Reply-To: <20260423-color-format-v14-20-449a419ccbd4@collabora.com>

Hi,

On Thu, 23 Apr 2026 at 20:06, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
> +       bridge = drm_bridge_chain_get_first_bridge(encoder);
> +       if (!bridge)
> +               return 0;
> +
> +       bstate = drm_atomic_get_bridge_state(conn_state->state, bridge);
> +       if (!bstate)
> +               return 0;

IS_ERR() + PTR_ERR()

Cheers,
Daniel

^ permalink raw reply

* Re: [PATCH v6 16/43] KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release()
From: Ackerley Tng @ 2026-05-21 14:40 UTC (permalink / raw)
  To: Sean Christopherson, Fuad Tabba
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <ag8BmtzxTlcuA_zy@google.com>

Sean Christopherson <seanjc@google.com> writes:

>
> [...snip...]
>
> --- virt/kvm/guest_memfd.c
> +++ virt/kvm/guest_memfd.c
> @@ -640,9 +640,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>  }
>
>  int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> -                 unsigned int fd, loff_t offset)
> +                 unsigned int fd, u64 offset)
>  {
> -       loff_t size = slot->npages << PAGE_SHIFT;
> +       u64 size = slot->npages << PAGE_SHIFT;
>         unsigned long start, end;
>         struct gmem_file *f;
>         struct inode *inode;
>

My mental model was:

+ offsets => loff_t
+ indices => pgoff_t
+ sizes => size_t

But looks like loff_t is more suitable for places where return values
(possibly negative) matter.

Good to go with u64!

> [...snip...]
>

^ permalink raw reply

* Re: [PATCH v3] killswitch: add per-function short-circuit mitigation primitive
From: Sasha Levin @ 2026-05-21 14:38 UTC (permalink / raw)
  To: Song Liu
  Cc: Daniel Borkmann, linux-kernel, linux-doc, linux-kselftest, bpf,
	live-patching, Greg Kroah-Hartman, Andrew Morton, Jonathan Corbet,
	Mathieu Desnoyers, Joshua Peisach, Florian Weimer, Breno Leitao,
	Anthony Iliopoulos, Michal Hocko, Jiri Olsa, John Fastabend,
	Christian Brauner
In-Reply-To: <CAPhsuW6C3hyciA4=z+V0BkQ9EEubuNCKLwoxtXorSbnhkUxdJQ@mail.gmail.com>

On Tue, May 19, 2026 at 03:00:15PM -0700, Song Liu wrote:
>On Tue, May 19, 2026 at 12:57 PM Sasha Levin <sashal@kernel.org> wrote:
>[...]
>> >Fully agree with Song here that there is no clear boundary, and that the
>> >killswitch could lead to arbitrary, hard to debug breakage if applied to
>> >the wrong function.. introducing worse bugs than the one being mitigated
>> >or even /short-circuit LSM enforcement/ (engage security_file_open 0,
>> >engage cap_capable 0, engage apparmor_* etc).
>>
>> This is similar to livepatch, right? Do we need guardrails there too?
>
>livepatch has the same guardrails as other kernel modules:
>CONFIG_MODULE_SIG, CONFIG_MODULE_SIG_FORCE, etc.

Which the user can choose to enable or disable. Livepatches will work just fine
with CONFIG_MODULE_SIG=n, right?

With the whitelist approach, the user has no choice but to accept it.

Would it make sense to allow disabling the whitelist via a kernel config or
some runtime flag?

-- 
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v6 11/43] KVM: guest_memfd: Ensure pages are not in use before conversion
From: Ackerley Tng @ 2026-05-21 14:36 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTyaBpTYsJRRyP09YggoHbi6s-ZgDoWoFgDRxO5k_BkoBw@mail.gmail.com>

Fuad Tabba <tabba@google.com> writes:

>
> [...snip...]
>
>> +static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
>> +                                           size_t nr_pages, pgoff_t *err_index)
>> +{
>> +       struct address_space *mapping = inode->i_mapping;
>> +       const int filemap_get_folios_refcount = 1;
>> +       pgoff_t last = start + nr_pages - 1;
>> +       struct folio_batch fbatch;
>> +       bool safe = true;
>> +       int i;
>> +
>> +       folio_batch_init(&fbatch);
>> +       while (safe && filemap_get_folios(mapping, &start, last, &fbatch)) {
>> +
>> +               for (i = 0; i < folio_batch_count(&fbatch); ++i) {
>> +                       struct folio *folio = fbatch.folios[i];
>> +
>> +                       if (folio_ref_count(folio) !=
>> +                           folio_nr_pages(folio) + filemap_get_folios_refcount) {
>> +                               safe = false;
>> +                               *err_index = folio->index;
>> +                               break;
>
> https://sashiko.dev/#/patchset/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4%40google.com?part=11
>

Sashiko's first issue on lru is addressed in a separate patch later. :)

> Sashiko raised a few issues here, but I think this one might be
> genuine. Can you look into it please?
>
> If that's right, when huge page support lands, if start falls in the
> middle of a large folio, returning folio->index as the err_index will
> return an offset strictly less than the requested start. A naive
> userspace retry loop resuming from error_offset would step backwards
> and corrupt attributes on memory it didn't intend to convert.
> err_index should be clamped to max(start, folio->index).
>

For these ones, I was thinking to defer all the huge-page related issues
to be fixed when huge pages land, since there are probably quite a few
places to update.

On second thought, this isn't a huge change, I'll fix this in the next
revision.

> Cheers,
> /fuad
>
>> +                       }
>> +               }
>> +
>>
>> [...snip...]
>>

^ permalink raw reply

* Re: [PATCH mm-unstable v17 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Wei Yang @ 2026-05-21 14:32 UTC (permalink / raw)
  To: Vernon Yang
  Cc: Wei Yang, Nico Pache, linux-doc, linux-kernel, linux-mm,
	linux-trace-kernel, aarcange, akpm, anshuman.khandual, apopple,
	baohua, baolin.wang, byungchul, catalin.marinas, cl, corbet,
	dave.hansen, david, dev.jain, gourry, hannes, hughd, jack,
	jackmanb, jannh, jglisse, joshua.hahnjy, kas, lance.yang, liam,
	ljs, mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
	pfalcato, rakie.kim, raquini, rdunlap, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <91015820-f39a-4b06-89de-b49e5ca465fd@gmail.com>

On Thu, May 21, 2026 at 01:11:18PM +0800, Vernon Yang wrote:
>On Thu, May 21, 2026 at 02:46:54AM +0000, Wei Yang wrote:
>> On Thu, May 21, 2026 at 10:36:15AM +0800, Vernon Yang wrote:
>> >On Mon, May 11, 2026 at 12:58:11PM -0600, Nico Pache wrote:
>> >> Enable khugepaged to collapse to mTHP orders. This patch implements the
>> >> main scanning logic using a bitmap to track occupied pages and a stack
>> >> structure that allows us to find optimal collapse sizes.
>> >>
>> >> Previous to this patch, PMD collapse had 3 main phases, a light weight
>> >> scanning phase (mmap_read_lock) that determines a potential PMD
>> >> collapse, an alloc phase (mmap unlocked), then finally heavier collapse
>> >> phase (mmap_write_lock).
>> >>
>> >> To enabled mTHP collapse we make the following changes:
>> >>
>> >> During PMD scan phase, track occupied pages in a bitmap. When mTHP
>> >> orders are enabled, we remove the restriction of max_ptes_none during the
>> >> scan phase to avoid missing potential mTHP collapse candidates. Once we
>> >> have scanned the full PMD range and updated the bitmap to track occupied
>> >> pages, we use the bitmap to find the optimal mTHP size.
>> >>
>> >> Implement collapse_scan_bitmap() to perform binary recursion on the bitmap
>> >> and determine the best eligible order for the collapse. A stack structure
>> >> is used instead of traditional recursion to manage the search. This also
>> >> prevents a traditional recursive approach when the kernel stack struct is
>> >> limited. The algorithm recursively splits the bitmap into smaller chunks to
>> >> find the highest order mTHPs that satisfy the collapse criteria. We start
>> >> by attempting the PMD order, then moved on the consecutively lower orders
>> >> (mTHP collapse). The stack maintains a pair of variables (offset, order),
>> >> indicating the number of PTEs from the start of the PMD, and the order of
>> >> the potential collapse candidate.
>> >>
>> >> The algorithm for consuming the bitmap works as such:
>> >>     1) push (0, HPAGE_PMD_ORDER) onto the stack
>> >>     2) pop the stack
>> >>     3) check if the number of set bits in that (offset,order) pair
>> >>        statisfy the max_ptes_none threshold for that order
>> >>     4) if yes, attempt collapse
>> >>     5) if no (or collapse fails), push two new stack items representing
>> >>        the left and right halves of the current bitmap range, at the
>> >>        next lower order
>> >>     6) repeat at step (2) until stack is empty.
>> >>
>> >> Below is a diagram representing the algorithm and stack items:
>> >>
>> >>                             offset   mid_offset
>> >>                             |        |
>> >>                             |        |
>> >>                             v        v
>> >>           ____________________________________
>> >>          |          PTE Page Table            |
>> >>          --------------------------------------
>> >> 			    <-------><------->
>> >>                              order-1  order-1
>> >>
>> >> mTHP collapses reject regions containing swapped out or shared pages.
>> >> This is because adding new entries can lead to new none pages, and these
>> >> may lead to constant promotion into a higher order mTHP. A similar
>> >> issue can occur with "max_ptes_none > HPAGE_PMD_NR/2" due to a collapse
>> >> introducing at least 2x the number of pages, and on a future scan will
>> >> satisfy the promotion condition once again. This issue is prevented via
>> >> the collapse_max_ptes_none() function which imposes the max_ptes_none
>> >> restrictions above.
>> >>
>> >> We currently only support mTHP collapse for max_ptes_none values of 0
>> >> and HPAGE_PMD_NR - 1. resulting in the following behavior:
>> >>
>> >>     - max_ptes_none=0: Never introduce new empty pages during collapse
>> >>     - max_ptes_none=HPAGE_PMD_NR-1: Always try collapse to the highest
>> >>       available mTHP order
>> >>
>> >> Any other max_ptes_none value will emit a warning and skip mTHP collapse
>> >> attempts. There should be no behavior change for PMD collapse.
>> >>
>> >> Once we determine what mTHP sizes fits best in that PMD range a collapse
>> >> is attempted. A minimum collapse order of 2 is used as this is the lowest
>> >> order supported by anon memory as defined by THP_ORDERS_ALL_ANON.
>> >>
>> >> Currently madv_collapse is not supported and will only attempt PMD
>> >> collapse.
>> >>
>> >> We can also remove the check for is_khugepaged inside the PMD scan as
>> >> the collapse_max_ptes_none() function handles this logic now.
>> >>
>> >> Signed-off-by: Nico Pache <npache@redhat.com>
>> >> ---
>> >>  mm/khugepaged.c | 182 +++++++++++++++++++++++++++++++++++++++++++++---
>> >>  1 file changed, 174 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> >> index 3492b135d667..39bf7ea8a6e8 100644
>> >> --- a/mm/khugepaged.c
>> >> +++ b/mm/khugepaged.c
>> >> @@ -100,6 +100,30 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
>> >>
>> >>  static struct kmem_cache *mm_slot_cache __ro_after_init;
>> >>
>> >> +#define KHUGEPAGED_MIN_MTHP_ORDER	2
>> >> +/*
>> >> + * mthp_collapse() does an iterative DFS over a binary tree, from
>> >> + * HPAGE_PMD_ORDER down to KHUGEPAGED_MIN_MTHP_ORDER. The max stack
>> >> + * size needed for a DFS on a binary tree is height + 1, where
>> >> + * height = HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER.
>> >> + *
>> >> + * ilog2 is used in place of HPAGE_PMD_ORDER because some architectures
>> >> + * (e.g. ppc64le) do not define HPAGE_PMD_ORDER until after build time.
>> >> + */
>> >> +#define MTHP_STACK_SIZE	(ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGED_MIN_MTHP_ORDER + 1)
>> >> +
>> >> +/*
>> >> + * Defines a range of PTE entries in a PTE page table which are being
>> >> + * considered for mTHP collapse.
>> >> + *
>> >> + * @offset: the offset of the first PTE entry in a PMD range.
>> >> + * @order: the order of the PTE entries being considered for collapse.
>> >> + */
>> >> +struct mthp_range {
>> >> +	u16 offset;
>> >> +	u8 order;
>> >> +};
>> >> +
>> >>  struct collapse_control {
>> >>  	bool is_khugepaged;
>> >>
>> >> @@ -111,6 +135,12 @@ struct collapse_control {
>> >>
>> >>  	/* nodemask for allocation fallback */
>> >>  	nodemask_t alloc_nmask;
>> >> +
>> >> +	/* Each bit represents a single occupied (!none/zero) page. */
>> >> +	DECLARE_BITMAP(mthp_bitmap, MAX_PTRS_PER_PTE);
>> >> +	/* A mask of the current range being considered for mTHP collapse. */
>> >> +	DECLARE_BITMAP(mthp_bitmap_mask, MAX_PTRS_PER_PTE);
>> >> +	struct mthp_range mthp_bitmap_stack[MTHP_STACK_SIZE];
>> >>  };
>> >>
>> >>  /**
>> >> @@ -1404,20 +1434,140 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
>> >>  	return result;
>> >>  }
>> >>
>> >> +static void collapse_mthp_stack_push(struct collapse_control *cc, int *stack_size,
>> >> +				     u16 offset, u8 order)
>> >> +{
>> >> +	const int size = *stack_size;
>> >> +	struct mthp_range *stack = &cc->mthp_bitmap_stack[size];
>> >> +
>> >> +	VM_WARN_ON_ONCE(size >= MTHP_STACK_SIZE);
>> >> +	stack->order = order;
>> >> +	stack->offset = offset;
>> >> +	(*stack_size)++;
>> >> +}
>> >> +
>> >> +static struct mthp_range collapse_mthp_stack_pop(struct collapse_control *cc,
>> >> +						 int *stack_size)
>> >> +{
>> >> +	const int size = *stack_size;
>> >> +
>> >> +	VM_WARN_ON_ONCE(size <= 0);
>> >> +	(*stack_size)--;
>> >> +	return cc->mthp_bitmap_stack[size - 1];
>> >> +}
>> >> +
>> >> +static unsigned int collapse_mthp_count_present(struct collapse_control *cc,
>> >> +						u16 offset, unsigned int nr_ptes)
>> >> +{
>> >> +	bitmap_zero(cc->mthp_bitmap_mask, MAX_PTRS_PER_PTE);
>> >> +	bitmap_set(cc->mthp_bitmap_mask, offset, nr_ptes);
>> >> +	return bitmap_weight_and(cc->mthp_bitmap, cc->mthp_bitmap_mask, MAX_PTRS_PER_PTE);
>> >> +}
>> >> +
>> >> +/*
>> >> + * mthp_collapse() consumes the bitmap that is generated during
>> >> + * collapse_scan_pmd() to determine what regions and mTHP orders fit best.
>> >> + *
>> >> + * Each bit in cc->mthp_bitmap represents a single occupied (!none/zero) page.
>> >> + * A stack structure cc->mthp_bitmap_stack is used to check different regions
>> >> + * of the bitmap for collapse eligibility. The stack maintains a pair of
>> >> + * variables (offset, order), indicating the number of PTEs from the start of
>> >> + * the PMD, and the order of the potential collapse candidate respectively. We
>> >> + * start at the PMD order and check if it is eligible for collapse; if not, we
>> >> + * add two entries to the stack at a lower order to represent the left and right
>> >> + * halves of the PTE page table we are examining.
>> >> + *
>> >> + *                         offset       mid_offset
>> >> + *                         |         |
>> >> + *                         |         |
>> >> + *                         v         v
>> >> + *      --------------------------------------
>> >> + *      |          cc->mthp_bitmap            |
>> >> + *      --------------------------------------
>> >> + *                         <-------><------->
>> >> + *                          order-1  order-1
>> >> + *
>> >> + * For each of these, we determine how many PTE entries are occupied in the
>> >> + * range of PTE entries we propose to collapse, then we compare this to a
>> >> + * threshold number of PTE entries which would need to be occupied for a
>> >> + * collapse to be permitted at that order (accounting for max_ptes_none).
>> >> + *
>> >> + * If a collapse is permitted, we attempt to collapse the PTE range into a
>> >> + * mTHP.
>> >> + */
>> >> +static int mthp_collapse(struct mm_struct *mm, unsigned long address,
>> >> +		int referenced, int unmapped, struct collapse_control *cc,
>> >> +		unsigned long enabled_orders)
>> >> +{
>> >> +	unsigned int nr_occupied_ptes, nr_ptes;
>> >> +	int max_ptes_none, collapsed = 0, stack_size = 0;
>> >> +	unsigned long collapse_address;
>> >> +	struct mthp_range range;
>> >> +	u16 offset;
>> >> +	u8 order;
>> >> +
>> >> +	collapse_mthp_stack_push(cc, &stack_size, 0, HPAGE_PMD_ORDER);
>> >> +
>> >> +	while (stack_size) {
>> >> +		range = collapse_mthp_stack_pop(cc, &stack_size);
>> >> +		order = range.order;
>> >> +		offset = range.offset;
>> >> +		nr_ptes = 1UL << order;
>> >> +
>> >> +		if (!test_bit(order, &enabled_orders))
>> >> +			goto next_order;
>> >> +
>> >> +		max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
>> >> +
>> >> +		if (max_ptes_none < 0)
>> >> +			return collapsed;
>> >> +
>> >> +		nr_occupied_ptes = collapse_mthp_count_present(cc, offset,
>> >> +							       nr_ptes);
>> >> +
>> >> +		if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
>> >> +			int ret;
>> >> +
>> >> +			collapse_address = address + offset * PAGE_SIZE;
>> >> +			ret = collapse_huge_page(mm, collapse_address, referenced,
>> >> +						 unmapped, cc, order);
>> >> +			if (ret == SCAN_SUCCEED) {
>> >> +				collapsed += nr_ptes;
>> >> +				continue;
>> >> +			}
>> >> +		}
>> >> +
>> >> +next_order:
>> >> +		if (order > KHUGEPAGED_MIN_MTHP_ORDER) {
>> >
>> >Hi Nico, thank you very much for your contributions to this series.
>> >
>> >I found a minor issue, for MADV_COLLAPSE, if collapse_huge_page() fails
>> >for some reason (e.g. allocate folio), it goes to next_order and
>> >continues splitting to the next small order. However, enabled_orders
>> >only supports HPAGE_PMD_ORDER, so it keeps runing the split operations
>> >without any effective work until KHUGEPAGED_MIN_MTHP_ORDER is reached
>> >before exiting. For khugepaged, e.g. setting only 2MB to always, also
>> >same phenomenon.
>>
>> Yes, but it does no actual work since it is checked after pop up.
>>
>> >
>> >This does not affect the overall functionality of mthp collapse, just
>> >redundant.
>> >
>> >The redundant operations can be easily skipped with the following
>> >modification. If I miss some thing, please let me know. Thanks!
>> >
>> >diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> >index 1a25af3d6d0f..fa407cce525c 100644
>> >--- a/mm/khugepaged.c
>> >+++ b/mm/khugepaged.c
>> >@@ -1574,7 +1574,7 @@ static int mthp_collapse(struct mm_struct *mm, unsigned long address,
>> > 		}
>> >
>> > next_order:
>> >-		if (order > KHUGEPAGED_MIN_MTHP_ORDER) {
>> >+		if ((BIT(order) - 1) & enabled_orders) {
>> > 			const u8 next_order = order - 1;
>> > 			const u16 mid_offset = offset + (nr_ptes / 2);
>> >
>>
>> This would stop the iteration if there are other lower enabled order, right?
>             ^^^^                                  ^^^^^^^^^^^^^^^^^^^
>
>NO :)

Got it. You are right.

The logic here is all lower bits are not set, skip the rest.

>
>For more details, please refer to the following information.
>
>|              Scenario               | Old Behavior (order > 2) | New Behavior ((BIT(order)-1) & enabled_orders) |
>|-------------------------------------|--------------------------|------------------------------------------------|
>| MADV_COLLAPSE                       | Splits 9,8,7,...,3       | No split                                       |
>| khugepaged, only 2MB enabled        | Splits 9,8,7,...,3       | No split                                       |
>| khugepaged, only 2MB + 64KB enabled | Splits 9,8,7,...,3       | Splits 9,8,7,...,5                             |
>| khugepaged, only 32KB enabled       | Splits 9,8,7,...,3       | Splits 9,8,7,...,4                             |
>| khugepaged, only 16KB enabled       | Splits 9,8,7,...,3       | Splits 9,8,7,...,3                             |
>| khugepaged, all mTHP enabled        | Splits 9,8,7,...,3       | Splits 9,8,7,...,3                             |
>
>--
>Cheers,
>Vernon

-- 
Wei Yang
Help you, Help me

^ permalink raw reply

* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Ackerley Tng @ 2026-05-21 14:29 UTC (permalink / raw)
  To: Sean Christopherson, Fuad Tabba
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <ag8JIlHjohAOC3-g@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Thu, May 21, 2026, Fuad Tabba wrote:
>> On Wed, 20 May 2026 at 22:44, Ackerley Tng <ackerleytng@google.com> wrote:
>> >
>> > Fuad Tabba <tabba@google.com> writes:
>> >
>> > >
>> > > [...snip...]
>> > >
>> > >> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>> > >> +{
>> > >> +       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
>> > >> +       struct inode *inode;
>> > >> +
>> > >> +       /*
>> > >> +        * If this gfn has no associated memslot, there's no chance of the gfn
>> > >> +        * being backed by private memory, since guest_memfd must be used for
>> > >> +        * private memory, and guest_memfd must be associated with some memslot.
>> > >> +        */
>> > >> +       if (!slot)
>> > >> +               return 0;
>> > >> +
>> > >> +       CLASS(gmem_get_file, file)(slot);
>> > >> +       if (!file)
>> > >> +               return 0;
>> > >> +
>> > >> +       inode = file_inode(file);
>> > >> +
>> > >> +       /*
>> > >> +        * Rely on the maple tree's internal RCU lock to ensure a
>> > >> +        * stable result. This result can become stale as soon as the
>> > >> +        * lock is dropped, so the caller _must_ still protect
>> > >> +        * consumption of private vs. shared by checking
>> > >> +        * mmu_invalidate_retry_gfn() under mmu_lock to serialize
>> > >> +        * against ongoing attribute updates.
>> > >> +        */
>> > >> +       return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn));
>> > >> +}
>> > >
>> > > Doesn't this imply that all consumers of kvm_mem_is_private() should
>> > > validate the result using mmu_lock and the invalidation sequence?
>> >
>> > Let me know how I can improve the comment.
>>
>> Given Sean's context, the comment is good I think. I would quibble
>> with the the "_must_ still protect" phrasing being a bit too strict.
>>
>> Maybe just soften it slightly to acknowledge the exception? Something like:
>>
>>   * lock is dropped, so callers that require a strict result _must_ protect
>>   * consumption of private vs. shared by checking mmu_invalidate_retry_gfn()
>>   * under mmu_lock to serialize against ongoing attribute updates. Callers
>>   * doing lockless reads must be able to tolerate a stale result.
>>
>> That aligns the comment with how KVM is actually using it today. That
>> said, this is nitpicking. Feel free to use or ignore.
>
> Hmm, I wonder if we can figure out a way to consolidate some documentation,
> because this is _exactly_ the same pattern that x86's host_pfn_mapping_level()
> deals with (see its big comment below).
>

This would be great, are you thinking an actual comment or something in
Documentation/?

Perhaps we could iterate on this a little with me providing the newbie
perspective. Do you want me to take a stab at writing something up?

> There's also the stale comment in kvm_invalidate_memslot(), which, stating the
> obvious, speaks to the memslot+SRCU side of things.
>
> Maybe it makes sense to to find a central location for one giant comment about
> how how MMU notifier events and memslot+SRCU protections work?  And then refer
> to that in paths where some asset needs to be tied into MMU notifiers and/or
> memslots+SRCU?
>
> [*] https://lore.kernel.org/all/agcbWe8s9lmPuJwG@google.com
>
> [...snip...]
>

^ permalink raw reply

* Re: [PATCH v6 19/43] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes
From: Sean Christopherson @ 2026-05-21 14:21 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: ackerleytng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTzu=28Sr3=9A9LmJEGz0tBEDbU9taznVV5kdL7s8Nw=Jg@mail.gmail.com>

On Thu, May 21, 2026, Fuad Tabba wrote:
> Hi Ackerley,
> 
> On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
> <devnull+ackerleytng.google.com@kernel.org> wrote:
> >
> > From: Sean Christopherson <seanjc@google.com>
> >
> > Make vm_memory_attributes a module parameter so that userspace can disable
> > the use of memory attributes on the VM level.
> >
> > To avoid inconsistencies in the way memory attributes are tracked in KVM
> > and guest_memfd, the vm_memory_attributes module_param is made
> > read-only (0444).
> >
> > Make CONFIG_KVM_VM_MEMORY_ATTRIBUTES selectable, only for (CoCo) VM types
> > that might use vm_memory_attributes.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> 
> Config files always confuse me, but Sashiko might be onto something:
> 
> https://sashiko.dev/#/patchset/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4%40google.com?part=19

: Since this prompt does not have a default value, will it default to N
: and silently drop KVM_VM_MEMORY_ATTRIBUTES during configuration updates
: like make olddefconfig?
: 
: Existing userspace VMMs that rely on the KVM_SET_MEMORY_ATTRIBUTES ioctl
: for TDX or SEV VMs might fail to boot if the feature is unexpectedly
: compiled out. Could a default y be used to preserve backwards
: compatibility for existing configurations?

> I think this partially goes back to commit 6, the one I flagged
> yesterday. But also adding "default y" to KVM_VM_MEMORY_ATTRIBUTES?
> The default value should at least fix this issue, but I'm not sure if
> it would cause other problems...

Hrm.  As much as I want per-gmem attributes to be the default going forward,
silently breaking existing setups isn't great.  On the other hand, I'm *very*
skeptical there are any SNP or TDX deployments using a distro kernel, so I'm
still leaning towards forcing the issue and turning per-VM attributes off by
default.

^ permalink raw reply

* Re: [PATCH v14 03/28] drm: Add new general DRM property "color format"
From: Daniel Stone @ 2026-05-21 14:12 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
	Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
	Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
	Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
	Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
	linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
	intel-xe, linux-doc, wayland-devel, Werner Sembach,
	Andri Yngvason, Marius Vlad
In-Reply-To: <20260423-color-format-v14-3-449a419ccbd4@collabora.com>

Hi,

On Thu, 23 Apr 2026 at 20:04, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
> +       } else if (property == connector->color_format_property) {
> +               if (val > INT_MAX || !drm_connector_color_format_valid(val)) {
> +                       drm_dbg_atomic(connector->dev,
> +                                      "[CONNECTOR:%d:%s] unknown color format %llu\n",
> +                                      connector->base.id, connector->name, val);
> +                       return -EINVAL;
> +               }

Shouldn't this already be ensured by drm_property_change_valid_get()?

Cheers,
Daniel

^ permalink raw reply

* Re: [PATCH net-next v3 01/14] virtchnl: create 'include/linux/intel' and move necessary header files
From: Jakub Kicinski @ 2026-05-21 13:56 UTC (permalink / raw)
  To: Larysa Zaremba
  Cc: Tony Nguyen, davem, pabeni, edumazet, andrew+netdev, netdev,
	przemyslaw.kitszel, aleksander.lobakin, sridhar.samudrala,
	anjali.singhai, michal.swiatkowski, maciej.fijalkowski,
	emil.s.tantilov, madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, tatyana.e.nikolova, krzysztof.czurylo, jgg, leon,
	linux-rdma, Samuel Salin, Aleksandr Loktionov
In-Reply-To: <ag7QUgfpM5UAAE2z@soc-5CG4396X81.clients.intel.com>

On Thu, 21 May 2026 11:28:50 +0200 Larysa Zaremba wrote:
> On Wed, May 20, 2026 at 05:52:01PM -0700, Jakub Kicinski wrote:
> > On Fri, 15 May 2026 15:44:25 -0700 Tony Nguyen wrote:  
> > > include/linux/intel is vacant  
> > 
> > I don't see any other vendor directory under include/linux  
> 
> There are at least
> 
> include/linux/mlx4, include/linux/mlx5 and include/linux/bnxt.
> 
> Those are per-driver and not per-vendor, but intel ethernet has too many drivers 
> to have separate folders for them.
> 
> I just do not think this creates a precedent neccessarily.

You just said the other ones are for specific drivers.

> Folder structure is for you to decide as a maintainer, but it would be nice to 
> have known about such doubts earlier.

I'd love to know if you any suggestions for improving the process.
Otherwise please keep your venting off list.

^ permalink raw reply

* Re: [PATCH v5 03/13] iio: core: add hierarchical channel relationships
From: Rodrigo Alencar @ 2026-05-21 13:48 UTC (permalink / raw)
  To: rodrigo.alencar, linux-iio, devicetree, linux-kernel, linux-doc,
	linux-hardening
  Cc: Lars-Peter Clausen, Michael Hennerich, Jonathan Cameron,
	David Lechner, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	Kees Cook, Gustavo A. R. Silva
In-Reply-To: <20260517-ad9910-iio-driver-v5-3-31599c88314a@analog.com>

On 26/05/18 03:26PM, Rodrigo Alencar wrote:
> On 26/05/17 06:56PM, sashiko-bot@kernel.org wrote:
> > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> > - [Medium] Exposing the hierarchical relationship via channel labels creates an ambiguous UAPI for userspace.
> > --
> > 
> > commit 64c6f961f1d7645c58f732e4bd4c11200b999e93
> > Author: Rodrigo Alencar <rodrigo.alencar@analog.com>
> > 
> > iio: core: add hierarchical channel relationships
> > 
> > This commit introduces a parent-child relationship between IIO channels.
> > It adds a parent pointer to the iio_chan_spec struct and exposes this
> > relationship to userspace through a new "parent" sysfs attribute that
> > returns the parent channel's label.
> > 
> > > diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
> > > index 5c8404efd0a56..348ac7a597385 100644
> > > --- a/drivers/iio/industrialio-core.c
> > > +++ b/drivers/iio/industrialio-core.c
> > 
> > [ ... ]
> > 
> > > @@ -1263,6 +1271,31 @@ static int iio_device_add_channel_label(struct iio_dev *indio_dev,
> > >  	return 1;
> > >  }
> > >  
> > > +static int iio_device_add_channel_parent(struct iio_dev *indio_dev,
> > > +					 struct iio_chan_spec const *chan)
> > > +{
> > > +	struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
> > > +	int ret;
> > > +
> > > +	if (!chan->parent || (!indio_dev->info->read_label &&
> > > +			      !chan->parent->extend_name))
> > > +		return 0;
> > 
> > Is relying on channel labels robust enough for programmatic parent resolution?
> > 
> > IIO channel labels are optional, human-readable descriptors and are not
> > guaranteed by the core to be unique. If multiple channels within a device
> > share the same label, how can userspace definitively resolve which specific
> > channel is the parent?
> > 
> > Also, if a driver establishes a hierarchy where chan->parent is set, but the
> > parent channel lacks a label, this attribute is silently omitted, hiding the
> > relationship entirely.
> > 
> > Would it be more reliable to expose a definitive, unique identifier (such as
> > the parent channel's sysfs prefix or index) rather than relying on an optional
> > descriptive string?
> 
> A bit impressive that sashiko can comment on that architecture level.
> 
> Those are all good points. I suppose I went for this because otherwise it would
> require refactoring core code to get the needed parts from __iio_device_attr_init()
> that gets you the channel name without the attribute part.
> 
> Maybe we should have a function that outputs a channel name into a buffer based on
> __iio_device_attr_init() behavior, and that could be reused.

This is the function with behavior extracted from __iio_device_attr_init():

	static int __iio_chan_prefix_emit(const struct iio_chan_spec *chan,
					  enum iio_shared_by shared_by,
					  char *buf, size_t len)
	{
		const char *dir = iio_direction[chan->output];
		const char *type = iio_chan_type_name_spec[chan->type];
		int n = 0;

		switch (shared_by) {
		case IIO_SHARED_BY_ALL:
			break;
		case IIO_SHARED_BY_DIR:
			n = scnprintf(buf, len, "%s", dir);
			break;
		case IIO_SHARED_BY_TYPE:
			n = scnprintf(buf, len, "%s_%s", dir, type);
			if (chan->differential)
				n += scnprintf(buf + n, len - n, "-%s", type);
			break;
		case IIO_SEPARATE:
			if (chan->indexed) {
				n = scnprintf(buf, len, "%s_%s%d", dir, type,
					      chan->channel);
				if (chan->differential)
					n += scnprintf(buf + n, len - n, "-%s%d", type,
						       chan->channel2);
			} else {
				if (chan->differential) {
					WARN(1, "Differential channels must be indexed\n");
					return -EINVAL;
				}
				n = scnprintf(buf, len, "%s_%s", dir, type);
			}

			if (chan->modified) {
				if (chan->differential) {
					WARN(1, "Differential channels can not have modifier\n");
					return -EINVAL;
				}
				n += scnprintf(buf + n, len - n, "_%s",
					       iio_modifier_names[chan->channel2]);
			}

			if (chan->extend_name)
				n += scnprintf(buf + n, len - n, "_%s", chan->extend_name);
			break;
		}

		return n;
	}

I think it is clear and reusable.

> > > +
> > > +	ret = __iio_add_chan_devattr("parent",
> > > +				     chan,
> > > +				     &iio_read_channel_parent,
> > > +				     NULL,
> > > +				     0,
> > > +				     IIO_SEPARATE,
> > > +				     &indio_dev->dev,
> > > +				     NULL,
> > > +				     &iio_dev_opaque->channel_attr_list);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	return 1;
> > > +}
> > 
> > -- 
> > Sashiko AI review · https://sashiko.dev/#/patchset/20260517-ad9910-iio-driver-v5-0-31599c88314a@analog.com?part=3

-- 
Kind regards,

Rodrigo Alencar

^ permalink raw reply

* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Fuad Tabba @ 2026-05-21 13:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <ag8JIlHjohAOC3-g@google.com>

On Thu, 21 May 2026 at 14:31, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, May 21, 2026, Fuad Tabba wrote:
> > On Wed, 20 May 2026 at 22:44, Ackerley Tng <ackerleytng@google.com> wrote:
> > >
> > > Fuad Tabba <tabba@google.com> writes:
> > >
> > > >
> > > > [...snip...]
> > > >
> > > >> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
> > > >> +{
> > > >> +       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> > > >> +       struct inode *inode;
> > > >> +
> > > >> +       /*
> > > >> +        * If this gfn has no associated memslot, there's no chance of the gfn
> > > >> +        * being backed by private memory, since guest_memfd must be used for
> > > >> +        * private memory, and guest_memfd must be associated with some memslot.
> > > >> +        */
> > > >> +       if (!slot)
> > > >> +               return 0;
> > > >> +
> > > >> +       CLASS(gmem_get_file, file)(slot);
> > > >> +       if (!file)
> > > >> +               return 0;
> > > >> +
> > > >> +       inode = file_inode(file);
> > > >> +
> > > >> +       /*
> > > >> +        * Rely on the maple tree's internal RCU lock to ensure a
> > > >> +        * stable result. This result can become stale as soon as the
> > > >> +        * lock is dropped, so the caller _must_ still protect
> > > >> +        * consumption of private vs. shared by checking
> > > >> +        * mmu_invalidate_retry_gfn() under mmu_lock to serialize
> > > >> +        * against ongoing attribute updates.
> > > >> +        */
> > > >> +       return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn));
> > > >> +}
> > > >
> > > > Doesn't this imply that all consumers of kvm_mem_is_private() should
> > > > validate the result using mmu_lock and the invalidation sequence?
> > >
> > > Let me know how I can improve the comment.
> >
> > Given Sean's context, the comment is good I think. I would quibble
> > with the the "_must_ still protect" phrasing being a bit too strict.
> >
> > Maybe just soften it slightly to acknowledge the exception? Something like:
> >
> >   * lock is dropped, so callers that require a strict result _must_ protect
> >   * consumption of private vs. shared by checking mmu_invalidate_retry_gfn()
> >   * under mmu_lock to serialize against ongoing attribute updates. Callers
> >   * doing lockless reads must be able to tolerate a stale result.
> >
> > That aligns the comment with how KVM is actually using it today. That
> > said, this is nitpicking. Feel free to use or ignore.
>
> Hmm, I wonder if we can figure out a way to consolidate some documentation,
> because this is _exactly_ the same pattern that x86's host_pfn_mapping_level()
> deals with (see its big comment below).
>
> There's also the stale comment in kvm_invalidate_memslot(), which, stating the
> obvious, speaks to the memslot+SRCU side of things.
>
> Maybe it makes sense to to find a central location for one giant comment about
> how how MMU notifier events and memslot+SRCU protections work?  And then refer
> to that in paths where some asset needs to be tied into MMU notifiers and/or
> memslots+SRCU?
>
> [*] https://lore.kernel.org/all/agcbWe8s9lmPuJwG@google.com

This would fix a few related issues at once. sgtm
/fuad


/fuad

>
> /*
>  * Lookup the mapping level for @gfn in the current mm.
>  *
>  * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
>  * consumer to be tied into KVM's handlers for MMU notifier events!
>  *
>  * There are several ways to safely use this helper:
>  *
>  * - Check mmu_invalidate_retry_gfn() after grabbing the mapping level, before
>  *   consuming it.  In this case, mmu_lock doesn't need to be held during the
>  *   lookup, but it does need to be held while checking the MMU notifier.
>  *
>  * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
>  *   event for the hva.  This can be done by explicit checking the MMU notifier
>  *   or by ensuring that KVM already has a valid mapping that covers the hva.
>  *
>  * - Do not use the result to install new mappings, e.g. use the host mapping
>  *   level only to decide whether or not to zap an entry.  In this case, it's
>  *   not required to hold mmu_lock (though it's highly likely the caller will
>  *   want to hold mmu_lock anyways, e.g. to modify SPTEs).
>  *
>  * Note!  The lookup can still race with modifications to host page tables, but
>  * the above "rules" ensure KVM will not _consume_ the result of the walk if a
>  * race with the primary MMU occurs.
>  */

^ permalink raw reply

* Re: [PATCH net-next 3/3] net/mlx5: Apply devlink default eswitch mode during init
From: Thomas Weißschuh @ 2026-05-21 13:41 UTC (permalink / raw)
  To: Mark Bloch
  Cc: Tariq Toukan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller, Thomas Gleixner, Arnd Bergmann,
	Jonathan Corbet, Shuah Khan, Jiri Pirko, Simon Horman,
	Saeed Mahameed, Leon Romanovsky, Borislav Petkov (AMD),
	Andrew Morton, Randy Dunlap, Petr Mladek, Peter Zijlstra (Intel),
	Tejun Heo, Vlastimil Babka, Feng Tang, Christian Brauner,
	Dave Hansen, Dapeng Mi, Kees Cook, Marco Elver, Li RongQing,
	Eric Biggers, Paul E. McKenney, linux-doc, linux-kernel, netdev,
	linux-rdma, Gal Pressman, Dragos Tatulea, Jiri Pirko, Shay Drori,
	Moshe Shemesh
In-Reply-To: <3bbcf456-322c-46f9-b238-88fb8ad227b2@nvidia.com>

On Thu, May 21, 2026 at 04:16:28PM +0300, Mark Bloch wrote:
(...)

> NIPA flagged this patch with a build_allmodconfig_warn failure:
> https://netdev-ctrl.bots.linux.dev/logs/build/1098506/14585935/build_allmodconfig_warn/
> 
> I do not see how this mlx5 patch is related to the reported issue,
> but I looked into it anyway.
> 
> After the kernel has been built once, the issue can be reproduced by rerunning sparse
> only on version.o, which filters out the unrelated noise. I had an older sparse installed,
> so I used a local copy:
> 
> rm -f arch/x86/boot/version.o
> make V=1 C=1 CHECK=/labhome/mbloch/bin/sparse arch/x86/boot/version.o
> 
> This gives the same error reported by NIPA:
> 
> ...
> ...
> make -f ./scripts/Makefile.vmlinux
> make -f ./scripts/Makefile.build obj=arch/x86/boot arch/x86/boot/bzImage
> make -f ./scripts/Makefile.build obj=arch/x86/boot/compressed arch/x86/boot/compressed/vmlinux
> # CC      arch/x86/boot/version.o
>   gcc -Wp,-MMD,arch/x86/boot/.version.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -std=gnu11 -fms-extensions -m16 -g -Os -DDISABLE_BRANCH_PROFILING -D__DISABLE_EXPORTS -Wall -Wstrict-prototypes -march=i386 -mregparm=3 -fno-strict-aliasing -fomit-frame-pointer -fno-pic -mno-mmx -mno-sse -fcf-protection=none -ffreestanding -fno-stack-protector -Wno-address-of-packed-member -mpreferred-stack-boundary=2 -D_SETUP -fno-asynchronous-unwind-tables -Wimplicit-fallthrough=5     -DKBUILD_MODFILE='"arch/x86/boot/version"' -DKBUILD_BASENAME='"version"' -DKBUILD_MODNAME='"version"' -D__KBUILD_MODNAME=version -c -o arch/x86/boot/version.o arch/x86/boot/version.c
> # CHECK   arch/x86/boot/version.c
>   /labhome/mbloch/bin/sparse -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ -Wbitwise -Wno-return-void -Wno-unknown-attribute  -D__x86_64__ --arch=x86 -mlittle-endian -m64 -Wp,-MMD,arch/x86/boot/.version.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -std=gnu11 -fms-extensions -m16 -g -Os -DDISABLE_BRANCH_PROFILING -D__DISABLE_EXPORTS -Wall -Wstrict-prototypes -march=i386 -mregparm=3 -fno-strict-aliasing -fomit-frame-pointer -fno-pic -mno-mmx -mno-sse -fcf-protection=none -ffreestanding -fno-stack-protector -Wno-address-of-packed-member -mpreferred-stack-boundary=2 -D_SETUP -fno-asynchronous-unwind-tables -Wimplicit-fallthrough=5     -DKBUILD_MODFILE='"arch/x86/boot/version"' -DKBUILD_BASENAME='"version"' -DKBUILD_MODNAME='"version"' -D__KBUILD_MODNAME=version arch/x86/boot/version.c
> arch/x86/boot/version.c: note: in included file (through arch/x86/include/uapi/asm/bitsperlong.h, include/uapi/asm-generic/int-ll64.h, include/asm-generic/int-ll64.h, include/uapi/asm-generic/types.h, ...):
> ./include/asm-generic/bitsperlong.h:23:2: error: Inconsistent word size. Check asm/bitsperlong.h
> ./include/asm-generic/bitsperlong.h:27:33: error: static assertion failed: "Inconsistent word size. Check asm/bitsperlong.h"
> # cmd_gen_symversions_c arch/x86/boot/version.o
>   if nm arch/x86/boot/version.o 2>/dev/null | grep -q ' __export_symbol_'; then gcc -E -D__GENKSYMS__ -Wp,-MMD,arch/x86/boot/.version.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -std=gnu11 -fms-extensions -m16 -g -Os -DDISABLE_BRANCH_PROFILING -D__DISABLE_EXPORTS -Wall -Wstrict-prototypes -march=i386 -mregparm=3 -fno-strict-aliasing -fomit-frame-pointer -fno-pic -mno-mmx -mno-sse -fcf-protection=none -ffreestanding -fno-stack-protector -Wno-address-of-packed-member -mpreferred-stack-boundary=2 -D_SETUP -fno-asynchronous-unwind-tables -Wimplicit-fallthrough=5     -DKBUILD_MODFILE='"arch/x86/boot/version"' -DKBUILD_BASENAME='"version"' -DKBUILD_MODNAME='"version"' -D__KBUILD_MODNAME=version arch/x86/boot/version.c | ./scripts/genksyms/genksyms    >> arch/x86/boot/.version.o.cmd; fi
> # LD      arch/x86/boot/setup.elf
>   ld -m elf_x86_64 -z noexecstack  -m elf_i386 -z noexecstack -T arch/x86/boot/setup.ld arch/x86/boot/a20.o arch/x86/boot/bioscall.o arch/x86/boot/cmdline.o arch/x86/boot/copy.o arch/x86/boot/cpu.o arch/x86/boot/cpuflags.o arch/x86/boot/cpucheck.o arch/x86/boot/early_serial_console.o arch/x86/boot/edd.o arch/x86/boot/header.o arch/x86/boot/main.o arch/x86/boot/memory.o arch/x86/boot/pm.o arch/x86/boot/pmjump.o arch/x86/boot/printf.o arch/x86/boot/regs.o arch/x86/boot/string.o arch/x86/boot/tty.o arch/x86/boot/video.o arch/x86/boot/video-mode.o arch/x86/boot/version.o arch/x86/boot/video-vga.o arch/x86/boot/video-vesa.o arch/x86/boot/video-bios.o -o arch/x86/boot/setup.elf
> # OBJCOPY arch/x86/boot/setup.bin
>   objcopy  -O binary arch/x86/boot/setup.elf arch/x86/boot/setup.bin
> # BUILD   arch/x86/boot/bzImage
>   (dd if=arch/x86/boot/setup.bin bs=4k conv=sync status=none; cat arch/x86/boot/vmlinux.bin) >arch/x86/boot/bzImage
> mkdir -p ./arch/x86_64/boot
> ln -fsn ../../x86/boot/bzImage ./arch/x86_64/boot/bzImage
> 
> To me this looks like sparse is getting a conflicting set of flags.
> The command line contains both "-D__x86_64__ -m64" and "-m16 -march=i386 -D_SETUP".
> 
> I confirmed that the following patch "fixes" the issue, but I do not know whether
> this is the right fix. This area is outside my comfort zone, so it would be
> helpful if someone more familiar with the x86 build/sparse flow could take a
> look:
> 
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index 3f9fb3698d66..80923864f6f9 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -71,6 +71,10 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
> 
>  SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
> 
> +realmode-checkflags-$(CONFIG_X86_64) := -m32 -U__x86_64__ -D__i386__
> +REALMODE_CHECKFLAGS := $(filter-out -m64 -D__x86_64__,$(CHECKFLAGS)) $(realmode-checkflags-y)
> +$(SETUP_OBJS): CHECKFLAGS := $(REALMODE_CHECKFLAGS)
> +
>  sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|efi.._stub_entry\|efi\(32\)\?_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|_e\?data\|_e\?sbat\|z_.*\)$$/\#define ZO_\2 0x\1/p'
> 
>  quiet_cmd_zoffset = ZOFFSET $@
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index a0fb39abc5c8..341b0ff20c3d 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -29,6 +29,10 @@ targets      += $(realmode-y)
> 
>  REALMODE_OBJS = $(addprefix $(obj)/,$(realmode-y))
> 
> +realmode-checkflags-$(CONFIG_X86_64) := -m32 -U__x86_64__ -D__i386__
> +REALMODE_CHECKFLAGS := $(filter-out -m64 -D__x86_64__,$(CHECKFLAGS)) $(realmode-checkflags-y)
> +$(REALMODE_OBJS): CHECKFLAGS := $(REALMODE_CHECKFLAGS)
> +

The idea looks good, we do something similar for the 32-bit vDSO:

arch/x86/entry/vdso/vdso32/Makefile

CHECKFLAGS := $(subst -m64,-m32,$(CHECKFLAGS))
CHECKFLAGS := $(subst -D__x86_64__,-D__i386__,$(CHECKFLAGS))

It seems the same kind of substitution would work here.
We can add a helper function to arch/x86/Makefile and
use that also for the compat vDSO.

I am wondering why this didn't show up before.
Are you going to send a patch or should I?


Thomas

^ permalink raw reply

* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Sean Christopherson @ 2026-05-21 13:31 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTzLCD-dU-euZKgzwyEr2ecPqFDNutcaHm2fCDGA+MHVXA@mail.gmail.com>

On Thu, May 21, 2026, Fuad Tabba wrote:
> On Wed, 20 May 2026 at 22:44, Ackerley Tng <ackerleytng@google.com> wrote:
> >
> > Fuad Tabba <tabba@google.com> writes:
> >
> > >
> > > [...snip...]
> > >
> > >> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
> > >> +{
> > >> +       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> > >> +       struct inode *inode;
> > >> +
> > >> +       /*
> > >> +        * If this gfn has no associated memslot, there's no chance of the gfn
> > >> +        * being backed by private memory, since guest_memfd must be used for
> > >> +        * private memory, and guest_memfd must be associated with some memslot.
> > >> +        */
> > >> +       if (!slot)
> > >> +               return 0;
> > >> +
> > >> +       CLASS(gmem_get_file, file)(slot);
> > >> +       if (!file)
> > >> +               return 0;
> > >> +
> > >> +       inode = file_inode(file);
> > >> +
> > >> +       /*
> > >> +        * Rely on the maple tree's internal RCU lock to ensure a
> > >> +        * stable result. This result can become stale as soon as the
> > >> +        * lock is dropped, so the caller _must_ still protect
> > >> +        * consumption of private vs. shared by checking
> > >> +        * mmu_invalidate_retry_gfn() under mmu_lock to serialize
> > >> +        * against ongoing attribute updates.
> > >> +        */
> > >> +       return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn));
> > >> +}
> > >
> > > Doesn't this imply that all consumers of kvm_mem_is_private() should
> > > validate the result using mmu_lock and the invalidation sequence?
> >
> > Let me know how I can improve the comment.
> 
> Given Sean's context, the comment is good I think. I would quibble
> with the the "_must_ still protect" phrasing being a bit too strict.
> 
> Maybe just soften it slightly to acknowledge the exception? Something like:
> 
>   * lock is dropped, so callers that require a strict result _must_ protect
>   * consumption of private vs. shared by checking mmu_invalidate_retry_gfn()
>   * under mmu_lock to serialize against ongoing attribute updates. Callers
>   * doing lockless reads must be able to tolerate a stale result.
> 
> That aligns the comment with how KVM is actually using it today. That
> said, this is nitpicking. Feel free to use or ignore.

Hmm, I wonder if we can figure out a way to consolidate some documentation,
because this is _exactly_ the same pattern that x86's host_pfn_mapping_level()
deals with (see its big comment below).

There's also the stale comment in kvm_invalidate_memslot(), which, stating the
obvious, speaks to the memslot+SRCU side of things.

Maybe it makes sense to to find a central location for one giant comment about
how how MMU notifier events and memslot+SRCU protections work?  And then refer
to that in paths where some asset needs to be tied into MMU notifiers and/or
memslots+SRCU?

[*] https://lore.kernel.org/all/agcbWe8s9lmPuJwG@google.com


/*
 * Lookup the mapping level for @gfn in the current mm.
 *
 * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
 * consumer to be tied into KVM's handlers for MMU notifier events!
 *
 * There are several ways to safely use this helper:
 *
 * - Check mmu_invalidate_retry_gfn() after grabbing the mapping level, before
 *   consuming it.  In this case, mmu_lock doesn't need to be held during the
 *   lookup, but it does need to be held while checking the MMU notifier.
 *
 * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
 *   event for the hva.  This can be done by explicit checking the MMU notifier
 *   or by ensuring that KVM already has a valid mapping that covers the hva.
 *
 * - Do not use the result to install new mappings, e.g. use the host mapping
 *   level only to decide whether or not to zap an entry.  In this case, it's
 *   not required to hold mmu_lock (though it's highly likely the caller will
 *   want to hold mmu_lock anyways, e.g. to modify SPTEs).
 *
 * Note!  The lookup can still race with modifications to host page tables, but
 * the above "rules" ensure KVM will not _consume_ the result of the walk if a
 * race with the primary MMU occurs.
 */

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox