Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v10 2/9] revocable: Add KUnit test cases
From: Tzung-Bi Shih @ 2026-05-12  8:12 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Benson Leung, linux-kernel, chrome-platform, driver-core,
	linux-doc, linux-gpio, Rafael J. Wysocki, Danilo Krummrich,
	Jonathan Corbet, Shuah Khan, Laurent Pinchart, Wolfram Sang,
	Jason Gunthorpe, Johan Hovold, Paul E . McKenney, Arnd Bergmann,
	Greg Kroah-Hartman, Linus Walleij
In-Reply-To: <CAMRc=McPez6Ver5NgrDPnM9YDb7cPonWE7BBsS_5AnY9tGf3xQ@mail.gmail.com>

On Mon, May 11, 2026 at 06:10:32AM -0700, Bartosz Golaszewski wrote:
> On Fri, 8 May 2026 12:54:41 +0200, Tzung-Bi Shih <tzungbi@kernel.org> said:
> > diff --git a/drivers/base/revocable_test.c b/drivers/base/revocable_test.c
> 
> Please move this under drivers/base/tests/ where the rest of kunit modules
> live and name it revocable-test.c for consistency with the existing ones.

Ack, I overlooked the folder.  Will move the test to drivers/base/test/ and
rename it in the next version.

> > +#include <kunit/test.h>
> 
> Add a newline here as do other kunit modules.

Ack, will fix it in the next version.

^ permalink raw reply

* Re: [PATCH v10 1/9] revocable: Revocable resource management
From: Tzung-Bi Shih @ 2026-05-12  8:12 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Benson Leung, linux-kernel, chrome-platform, driver-core,
	linux-doc, linux-gpio, Rafael J. Wysocki, Danilo Krummrich,
	Jonathan Corbet, Shuah Khan, Laurent Pinchart, Wolfram Sang,
	Jason Gunthorpe, Johan Hovold, Paul E . McKenney, Arnd Bergmann,
	Greg Kroah-Hartman, Linus Walleij
In-Reply-To: <CAMRc=McRGoKdbpwyMO5x-Ttyr2n7+Chd8F2jwBF8j33SvNAGcg@mail.gmail.com>

On Mon, May 11, 2026 at 06:16:52AM -0700, Bartosz Golaszewski wrote:
> I have not looked into the implementation details all that much - that can
> always be ironed out later - but for the API part: I quite like it now. The
> resulting GPIO code looks cleaner and I think it's worth adding or v7.2.
> 
> Thanks for addressing the issues and perseverence.

Thank you for taking the time to review the APIs.  I'm glad you found the
changes to be an improvement!

^ permalink raw reply

* Re: [PATCH] docs: Document panic_on_rcu_stall default behavior
From: Kunwu Chan @ 2026-05-12  8:08 UTC (permalink / raw)
  To: paulmck; +Cc: corbet, skhan, linux-doc, linux-kernel, gustavold, Kunwu Chan
In-Reply-To: <8134f801-1494-47e1-84b1-7245616231ba@paulmck-laptop>

May 11, 2026 at 3:54 AM, "Paul E. McKenney" <paulmck@kernel.org mailto:paulmck@kernel.org?to=%22Paul%20E.%20McKenney%22%20%3Cpaulmck%40kernel.org%3E > wrote:


> 
> On Sat, May 09, 2026 at 05:12:14PM +0800, Kunwu Chan wrote:
> 
> > 
> > From: Kunwu Chan <kunwu.chan@gmail.com>
> >  
> >  Commit ab875b3e179f ("rcu: Add BOOTPARAM_RCU_STALL_PANIC
> >  Kconfig option") made the default value of
> >  kernel.panic_on_rcu_stall depend on
> >  CONFIG_BOOTPARAM_RCU_STALL_PANIC.
> >  
> >  Document this in kernel.rst
> >  
> >  Signed-off-by: Kunwu Chan <kunwu.chan@gmail.com>
> > 
> This commit depends on the commit you call out above, which, given Linus
> Torvalds's reaction, is unlikely to make it into mainline. :-(
> 
> A likely workaround is to use the existing sysctl kernel boot parameter,
> as in: "sysctl.kernel.panic_on_rcu_stall=1".
> 
> This can also be embedded into the kernel image using the bootconfig
> facility. To do this, build your kernel with the following Kconfig
> options:
> 
>  CONFIG_BOOT_CONFIG=y
>  CONFIG_BOOT_CONFIG_FORCE=y
>  CONFIG_BOOT_CONFIG_EMBED=y
>  CONFIG_BOOT_CONFIG_EMBED_FILE=".bootconfig"
> 
Hi Paul,

Thank you for the detailed explanation and the alternative solutions.
I wasn't aware of the feedback from Linus regarding the dependency commit. 
Since it's unlikely to be merged into mainline, :-(
it makes sense to drop this documentation patch as well.

> Then create your ".bootconfig" file in the top-level directory of your
> Linux-kernel source tree:
> 
>  kernel {
>  sysctl.kernel.panic_on_rcu_stall=1
>  }
> 
> You can also pass parameters to the "init" process by adding an "init"
> stanza to your .bootconfig file. See the Linux-kernel bootconfig
> documentation for more information:
> 
>  Documentation/admin-guide/bootconfig.rst
> 
I will look into the bootconfig and sysctl approaches you suggested. 
Thanks again for pointing me in the right direction!

>  Thanx, Paul
> 

Best regards,
Kunwu Chan
> > 
> > ---
> >  Documentation/admin-guide/sysctl/kernel.rst | 4 ++++
> >  1 file changed, 4 insertions(+)
> >  
> >  diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> >  index c6994e55d141..99598a83f830 100644
> >  --- a/Documentation/admin-guide/sysctl/kernel.rst
> >  +++ b/Documentation/admin-guide/sysctl/kernel.rst
> >  @@ -948,6 +948,10 @@ panic_on_rcu_stall
> >  When set to 1, calls panic() after RCU stall detection messages. This
> >  is useful to define the root cause of RCU stalls using a vmcore.
> >  
> >  +The default value can be configured at build time via
> >  +``CONFIG_BOOTPARAM_RCU_STALL_PANIC``. Runtime updates to this sysctl
> >  +always override the built-in default.
> >  +
> >  = ============================================================
> >  0 Do not panic() when RCU stall takes place, default behavior.
> >  1 panic() after printing RCU stall messages.
> >  -- 
> >  2.43.0
> >
>

^ permalink raw reply

* Re: [RFC PATCH 3/5] mm: zswap: load fully stored large folios
From: Fujunjie @ 2026-05-12  8:05 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Johannes Weiner, Chris Li, Kairui Song, Nhat Pham,
	linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	David Hildenbrand, Ryan Roberts, Barry Song, Baolin Wang,
	Chengming Zhou, Baoquan He, Lorenzo Stoakes, Michal Hocko,
	Roman Gushchin, Shakeel Butt
In-Reply-To: <agJV-73LaAzMqDBg@google.com>



On 5/12/2026 6:38 AM, Yosry Ahmed wrote:
> On Fri, May 08, 2026 at 08:20:31PM +0000, fujunjie wrote:
>> zswap_store() already stores every base page of a large folio as a
>> separate zswap entry and tears the whole folio back down on store
>> failure. The load side still rejects any large folio, which forces the
>> swapin path to avoid mTHP swapin once zswap has ever been enabled.
>>
>> Use zswap_entry_batch() to distinguish three cases: the whole range is
>> absent from zswap and should fall through to the disk backend, the whole
>> range is present and can be decompressed one base page at a time, or the
>> range is mixed and must be treated as an invalid large-folio backend
>> selection.
>>
>> After all entries decompress successfully, mark the folio uptodate and
>> dirty, account the mTHP swpin stat once for the folio, account one ZSWPIN
>> event per base page, and invalidate each zswap entry because the
>> swapcache folio becomes authoritative.
>>
>> Signed-off-by: fujunjie <fujunjie1@qq.com>
>> ---
>>  Documentation/admin-guide/mm/transhuge.rst |  4 +-
>>  mm/zswap.c                                 | 65 ++++++++++++++--------
>>  2 files changed, 45 insertions(+), 24 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index 5fbc3d89bb07..05456906aff6 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -644,8 +644,8 @@ zswpout
>>  	piece without splitting.
>>  
>>  swpin
>> -	is incremented every time a huge page is swapped in from a non-zswap
>> -	swap device in one piece.
>> +	is incremented every time a huge page is swapped in from swap I/O or
>> +	zswap in one piece.
>>  
>>  swpin_fallback
>>  	is incremented if swapin fails to allocate or charge a huge page
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index 27c14b8edd15..863ca1e896ed 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -28,6 +28,7 @@
>>  #include <crypto/acompress.h>
>>  #include <crypto/scatterwalk.h>
>>  #include <linux/zswap.h>
>> +#include <linux/huge_mm.h>
>>  #include <linux/mm_types.h>
>>  #include <linux/page-flags.h>
>>  #include <linux/swapops.h>
>> @@ -1614,20 +1615,23 @@ bool zswap_store(struct folio *folio)
>>   *  NOT marked up-to-date, so that an IO error is emitted (e.g. do_swap_page()
>>   *  will SIGBUS).
>>   *
>> - *  -EINVAL: if the swapped out content was in zswap, but the page belongs
>> - *  to a large folio, which is not supported by zswap. The folio is unlocked,
>> - *  but NOT marked up-to-date, so that an IO error is emitted (e.g.
>> - *  do_swap_page() will SIGBUS).
>> + *  -EINVAL: if the folio spans a mix of zswap and non-zswap entries. The
>> + *  folio is unlocked, but NOT marked up-to-date, so that an IO error is
>> + *  emitted (e.g. do_swap_page() will SIGBUS). Large folio swapin should
>> + *  reject such ranges before calling zswap_load().
>>   *
>> - *  -ENOENT: if the swapped out content was not in zswap. The folio remains
>> + *  -ENOENT: if the swapped out content was not in zswap. For a large folio,
>> + *  this means the whole folio range was not in zswap. The folio remains
>>   *  locked on return.
>>   */
>>  int zswap_load(struct folio *folio)
>>  {
>>  	swp_entry_t swp = folio->swap;
>>  	pgoff_t offset = swp_offset(swp);
>> -	struct xarray *tree = swap_zswap_tree(swp);
>>  	struct zswap_entry *entry;
>> +	int nr_pages = folio_nr_pages(folio);
>> +	bool is_zswap;
>> +	int index;
>>  
>>  	VM_WARN_ON_ONCE(!folio_test_locked(folio));
>>  	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
>> @@ -1635,30 +1639,36 @@ int zswap_load(struct folio *folio)
>>  	if (zswap_never_enabled())
>>  		return -ENOENT;
>>  
>> -	/*
>> -	 * Large folios should not be swapped in while zswap is being used, as
>> -	 * they are not properly handled. Zswap does not properly load large
>> -	 * folios, and a large folio may only be partially in zswap.
>> -	 */
>> -	if (WARN_ON_ONCE(folio_test_large(folio))) {
>> +	if (zswap_entry_batch(swp, nr_pages, &is_zswap) != nr_pages) {
>> +		WARN_ON_ONCE(folio_test_large(folio));
> 
> IIUC the condition can only be true for large folios, so I think just
> WARN_ON_ONCE() in the if statement itself?
> 
> Taking a step back, this is doing xa_load() lookups that are repeated
> again below. Maybe we should drop this check here and integrate the
> check into the loop below (either all entries exist or don't)?
> 
> We might end up with a case where we decompress some parts of the folio
> then return a failure, but this possibility already exists (e.g.
> decompression failures).
> 
>>  		folio_unlock(folio);
>>  		return -EINVAL;
>>  	}
>>  
>> -	entry = xa_load(tree, offset);
>> -	if (!entry)
>> +	if (!is_zswap)
>>  		return -ENOENT;
>>  
>> -	if (!zswap_decompress(entry, folio, 0)) {
>> -		folio_unlock(folio);
>> -		return -EIO;
>> +	for (index = 0; index < nr_pages; index++) {
>> +		swp_entry_t entry_swp = swp_entry(swp_type(swp),
>> +						  offset + index);
>> +		struct xarray *tree = swap_zswap_tree(entry_swp);
>> +
>> +		entry = xa_load(tree, offset + index);
>> +		if (WARN_ON_ONCE(!entry)) {
>> +			folio_unlock(folio);
>> +			return -EINVAL;
>> +		}
>> +
>> +		if (!zswap_decompress(entry, folio, index)) {
>> +			folio_unlock(folio);
>> +			return -EIO;
>> +		}
>>  	}
>>  
>>  	folio_mark_uptodate(folio);
>>  
>> -	count_vm_event(ZSWPIN);
>> -	if (entry->objcg)
>> -		count_objcg_events(entry->objcg, ZSWPIN, 1);
>> +	count_mthp_stat(folio_order(folio), MTHP_STAT_SWPIN);
> 
> Not a problem with this patch, but I wonder why different swapin
> implementations are making this call instead of putting it in a
> higher-level (e.g. alloc_swap_folio()).
> 
>> +	count_vm_events(ZSWPIN, nr_pages);
>>  
>>  	/*
>>  	 * We are reading into the swapcache, invalidate zswap entry.
>> @@ -1668,8 +1678,19 @@ int zswap_load(struct folio *folio)
>>  	 * compression work.
>>  	 */
>>  	folio_mark_dirty(folio);
>> -	xa_erase(tree, offset);
>> -	zswap_entry_free(entry);
>> +
>> +	for (index = 0; index < nr_pages; index++) {
>> +		swp_entry_t entry_swp = swp_entry(swp_type(swp),
>> +						  offset + index);
>> +		struct xarray *tree = swap_zswap_tree(entry_swp);
>> +
>> +		entry = xa_erase(tree, offset + index);
>> +		if (WARN_ON_ONCE(!entry))
>> +			continue;
>> +		if (entry->objcg)
>> +			count_objcg_events(entry->objcg, ZSWPIN, 1);
> 
> Hmm I was wondering how the stat update should be handled here. It is
> possible that a folio was swapped out from one memcg and is being
> swapped in by a process in a different memcg. For order-0 folios, the
> folio gets charged to the swapout memcg.
> 
> However, looking at alloc_swap_folio() ->
> mem_cgroup_swapin_charge_folio(), it seems like we charge mTHP folios to
> the swapout memcg of the first swap entry. This seems a bit arbitrary.
> 
> Focusing at the code here, seems like we'll count the swapin against the
> swapout memcg, which is good in terms of keeping ZSWPIN and ZSWPOUT
> balanced. However, it seems like it's possible that the folio is being
> charged to a different memcg here than the swapout memcg, so we may end
> up counting ZSWPIN in one memcg but charging the folio in another.
> 
> I am not sure if this is practically a problem, but something doesn't
> sound right. Johannes (and other memcg folks), were these the intended
> charging semantics for mTHP swapin? Is it reasonable to count ZSWPIN in
> one memcg and charge the folio in another?
> 
>> +		zswap_entry_free(entry);
>> +	}
>>  
>>  	folio_unlock(folio);
>>  	return 0;
>> -- 
>> 2.34.1
>>
>>

Agreed on the duplicated xa_load() lookups; that should be folded into
the load loop if this is picked up again.

The MTHP_STAT_SWPIN and memcg accounting points also make sense. I will
avoid defining those semantics in this RFC and revisit them if there is a
future version based on the common swapin path.

Thanks.


^ permalink raw reply

* Re: [RFC PATCH 0/5] mm: support zswap-backed anonymous large folio swapin
From: Fujunjie @ 2026-05-12  8:02 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Chris Li, Kairui Song, Johannes Weiner, Nhat Pham,
	linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	David Hildenbrand, Ryan Roberts, Barry Song, Baolin Wang,
	Chengming Zhou, Baoquan He, Lorenzo Stoakes
In-Reply-To: <agJT6D5zaUD6FpwQ@google.com>



On 5/12/2026 6:13 AM, Yosry Ahmed wrote:
>> Feedback would be especially helpful on:
>>
>> 1. whether it makes sense to support all-zswap large folio swapin first,
>>    while keeping mixed zswap/disk ranges on the order-0 fallback path
> 
> I think so, yes, but based on my read of the code this RFC only affects
> synchornous swapin, which is more-or-less zram+zswap. This is an
> uncommon setup outside of testing.
> 
>> 2. whether a follow-up for mixed zswap/disk large folio swapin would be
>>    useful after this RFC
> 
> That's a heavier lift and I think we should consider this in the
> longer-term, once the virtual swap work settles down. This is
> conceptually not a zswap thing, you can have parts of a folio on disk,
> in zswap, in the zeromap, etc. So it needs to be handled at a higher
> layer (virtual swap for example).
>
Thanks Yosry.

That makes sense. I agree that the mixed zswap/disk/zeromap case is not
really zswap-specific and should be handled at a higher layer, likely
after the virtual swap work settles.

Given the feedback on the swapin path structure and Alexandre's ongoing
work in this area, I will pause this RFC in its current form and follow
those series first.

Thanks


^ permalink raw reply

* Re: [PATCH net-next 1/2] net: ti: icssg: Derive stats array lengths from ARRAY_SIZE
From: David CARLIER @ 2026-05-12  7:58 UTC (permalink / raw)
  To: MD Danish Anwar
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, Roger Quadros,
	Andrew Lunn, Jacob Keller, Meghana Malladi, Kevin Hao,
	Vadim Fedorenko, netdev, linux-doc, linux-kernel,
	linux-arm-kernel, Vignesh Raghavendra
In-Reply-To: <20260512060627.3781329-2-danishanwar@ti.com>

Hi MD,

On Tue, 12 May 2026 at 07:06, MD Danish Anwar <danishanwar@ti.com> wrote:
>
> Replace the manually maintained ICSSG_NUM_MIIG_STATS and
> ICSSG_NUM_PA_STATS constants with ARRAY_SIZE() expressions derived
> directly from the corresponding stat descriptor arrays, so that adding
> new entries to icssg_all_miig_stats[] or icssg_all_pa_stats[] no longer
> requires a separate update to a numeric constant.
>
> To make this self-contained, break the circular include dependency
> between icssg_stats.h and icssg_prueth.h:
>
>   - icssg_stats.h previously included icssg_prueth.h (transitively
>     pulling in icssg_switch_map.h and ETH_GSTRING_LEN).  Replace that
>     with direct includes of <linux/ethtool.h>, <linux/kernel.h> and
>     "icssg_switch_map.h".
>
>   - icssg_prueth.h now includes icssg_stats.h, giving it access to
>     the ARRAY_SIZE-based ICSSG_NUM_MIIG_STATS and ICSSG_NUM_PA_STATS
>     before they are used in the prueth_emac struct and ICSSG_NUM_STATS.
>
> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
> ---
>  drivers/net/ethernet/ti/icssg/icssg_prueth.h | 3 +--
>  drivers/net/ethernet/ti/icssg/icssg_stats.h  | 7 ++++++-
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> index df93d15c5b78..e2ccecb0a0dd 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> @@ -43,6 +43,7 @@
>
>  #include "icssg_config.h"
>  #include "icss_iep.h"
> +#include "icssg_stats.h"
>  #include "icssg_switch_map.h"
>
>  #define PRUETH_MAX_MTU          (2000 - ETH_HLEN - ETH_FCS_LEN)
> @@ -57,8 +58,6 @@
>
>  #define ICSSG_MAX_RFLOWS       8       /* per slice */
>
> -#define ICSSG_NUM_PA_STATS     32
> -#define ICSSG_NUM_MIIG_STATS   60
>  /* Number of ICSSG related stats */
>  #define ICSSG_NUM_STATS (ICSSG_NUM_MIIG_STATS + ICSSG_NUM_PA_STATS)
>  #define ICSSG_NUM_STANDARD_STATS 31
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/net/ethernet/ti/icssg/icssg_stats.h
> index 5ec0b38e0c67..b854eb587c1e 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
> +++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
> @@ -8,10 +8,15 @@
>  #ifndef __NET_TI_ICSSG_STATS_H
>  #define __NET_TI_ICSSG_STATS_H
>
> -#include "icssg_prueth.h"
> +#include <linux/ethtool.h>
> +#include <linux/kernel.h>
> +#include "icssg_switch_map.h"
>
>  #define STATS_TIME_LIMIT_1G_MS    25000    /* 25 seconds @ 1G */
>
> +#define ICSSG_NUM_MIIG_STATS   ARRAY_SIZE(icssg_all_miig_stats)
> +#define ICSSG_NUM_PA_STATS     ARRAY_SIZE(icssg_all_pa_stats)
> +
>  struct miig_stats_regs {
>         /* Rx */
>         u32 rx_packets;
> --
> 2.34.1
>

One thing that caught my eye: icssg_all_miig_stats[] and
  icssg_all_pa_stats[] are 'static const' arrays in icssg_stats.h with
  ETH_GSTRING_LEN name buffers per entry. Right now only icssg_stats.c
  and icssg_ethtool.c pull them in. After this patch icssg_prueth.h
  includes icssg_stats.h, so every .c in the driver (classifier,
  common, config, mii_cfg, queues, switchdev, ...) ends up with its own
  static-const copy of both tables.

  Would a static_assert() work for what you're after? Something like:

    static const struct icssg_miig_stats icssg_all_miig_stats[] = {
        ...
    };
    static_assert(ARRAY_SIZE(icssg_all_miig_stats) == ICSSG_NUM_MIIG_STATS);

  next to each array, keeping the numeric #defines as-is. Then 2/2 fails
  to build the moment a new entry is added without bumping the count,
  which is the case you're guarding against — without touching the
  include graph.

What do you think ?

Cheers.

^ permalink raw reply

* Re: [RFC PATCH 4/5] mm: swap: fall back to order-0 after large swapin races
From: Fujunjie @ 2026-05-12  7:57 UTC (permalink / raw)
  To: Kairui Song, David Hildenbrand (Arm)
  Cc: Andrew Morton, Chris Li, Johannes Weiner, Nhat Pham, Yosry Ahmed,
	linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Ryan Roberts,
	Barry Song, Baolin Wang, Chengming Zhou, Baoquan He,
	Lorenzo Stoakes
In-Reply-To: <CAMgjq7Cokjb4-F9=cvwKmWR0q4==Vd61FHnjKbRdSHKH57erxw@mail.gmail.com>



On 5/11/2026 10:59 PM, Kairui Song wrote:
> On Mon, May 11, 2026 at 9:14 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 5/8/26 22:20, fujunjie wrote:
>>> swapin_folio() documents that a large folio insertion race returns NULL
>>> so the caller can fall back to order-0 swapin. do_swap_page() currently
>>> turns that NULL into VM_FAULT_OOM if the PTE is unchanged, which is
>>> harsher than necessary and gets in the way of rejecting large folio
>>> ranges for backend reasons.
>>>
>>> Move the synchronous swapin sequence into a helper and retry with an
>>> order-0 folio when a large folio cannot be inserted into the swap cache.
>>> Count the event as an mTHP swapin fallback before dropping the failed
>>> large allocation.
>>>
>>> Signed-off-by: fujunjie <fujunjie1@qq.com>
>>> ---
>>>  mm/memory.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
>>>  1 file changed, 39 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index ea6568571131..84e3b77b8293 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4757,6 +4757,44 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>>>  }
>>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>
>>> +static struct folio *swapin_synchronous_folio(swp_entry_t entry,
>>> +                                           struct vm_fault *vmf)
>>> +{
>>> +     struct folio *swapcache, *folio;
>>> +     bool large;
>>> +     int order;
>>> +
>>> +     folio = alloc_swap_folio(vmf);
>>> +     if (!folio)
>>> +             return NULL;
>>> +
>>> +     large = folio_test_large(folio);
>>> +     order = folio_order(folio);
>>> +
>>> +     /*
>>> +      * folio is charged, so swapin can only fail due to raced swapin and
>>> +      * return NULL.
>>> +      */
>>> +     swapcache = swapin_folio(entry, folio);
>>> +     if (swapcache == folio)
>>> +             return folio;
>>> +
>>> +     if (!swapcache && large)
>>> +             count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK);
>>> +     folio_put(folio);
>>> +     if (swapcache || !large)
>>> +             return swapcache;
>>> +
>>> +     folio = __alloc_swap_folio(vmf);
>>> +     if (!folio)
>>> +             return NULL;
>>> +
>>> +     swapcache = swapin_folio(entry, folio);
>>> +     if (swapcache != folio)
>>> +             folio_put(folio);
>>> +     return swapcache;
>>> +}
>>> +
>>>  /* Sanity check that a folio is fully exclusive */
>>>  static void check_swap_exclusive(struct folio *folio, swp_entry_t entry,
>>>                                unsigned int nr_pages)
>>> @@ -4860,17 +4898,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>               swap_update_readahead(folio, vma, vmf->address);
>>>       if (!folio) {
>>>               if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) {
>>> -                     folio = alloc_swap_folio(vmf);
>>> -                     if (folio) {
>>> -                             /*
>>> -                              * folio is charged, so swapin can only fail due
>>> -                              * to raced swapin and return NULL.
>>> -                              */
>>> -                             swapcache = swapin_folio(entry, folio);
>>> -                             if (swapcache != folio)
>>> -                                     folio_put(folio);
>>> -                             folio = swapcache;
>>> -                     }
>>> +                     folio = swapin_synchronous_folio(entry, vmf);
>>>               } else {
>>>                       folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vmf);
>>>               }
>>
>> There are some upcoming changes with:
>>
>> https://lore.kernel.org/r/20260421-swap-table-p4-v3-5-2f23759a76bc@tencent.com
>>
>>
>> All the of that logic you have in swapin_synchronous_folio() should ideally not
>> go into memory.c, but into some swap specific code.
>>
>> But
>>
>> https://lore.kernel.org/r/20260421-swap-table-p4-v3-0-2f23759a76bc@tencent.com
> 
> Thanks for mentioning this!
> 
> I think Junjie's change fits better after that change indeed. And I
> checked the code, it should fits easily too.
> 
> It's already strange enough that THP swapin is bundled with
> synchronous swapin, we better not make it more divergent here, and add
> more bits into memory.c.
> 
> And this commit will limit it to anon, no shmem, which is another
> strange detail. Or we'll have to repeat everything and copy these code
> to shmem.c...
> 
> Once all swap-ins uses basically the same path as in that series, all
> swap-ins will be able to have similar THP and zswap THP support too.

Thanks David and Kairui.

That makes sense. The helper in memory.c was mainly added to demonstrate
the fallback needed by this RFC, but I agree that growing more large-folio
swapin logic in the anon synchronous swapin path is not the right
direction.

I will not carry this patch forward in its current form. If I continue
with this work, I will rebase it on top of Kairui's swap-table or unified
swapin work and keep the allocation and fallback handling in the swap-specific
common path.

I also noticed Alexandre is already working on the large-folio swapin
side, so I will follow that series to avoid duplicating work.

Thanks for the pointers.


^ permalink raw reply

* Re: [RFC PATCH 0/5] mm: support zswap-backed anonymous large folio swapin
From: Fujunjie @ 2026-05-12  7:46 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Andrew Morton, Chris Li, Kairui Song, Johannes Weiner, Nhat Pham,
	Yosry Ahmed, linux-mm, linux-kernel, linux-doc, Jonathan Corbet,
	David Hildenbrand, Ryan Roberts, Barry Song, Baolin Wang,
	Chengming Zhou, Baoquan He, Lorenzo Stoakes
In-Reply-To: <CAEmasaV7ejxqb9-wTT=7xdt+icxj-ZvdSLkSoC6X5i6NMfsKPQ@mail.gmail.com>



On 5/12/2026 12:20 PM, Alexandre Ghiti wrote:
> So I have been working on the exact same thing for some weeks now. My work is based on Usama's series [1].
> 
> The problem with large folio swapin is that it can create swap thrashing: to swap in a large folio, swap out may be necessary, as reported in [2].
> 
> I implemented quite a few throttling algorithms on top to try to avoid this issue and so far, I have had mixed/inconsistent results. 
> 
> How did you test this series? Did you encounter thrashing? Do you have performance numbers?
> 
> Happy to talk more about this, thanks for your series!
> 
> Alex
> 
> [1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@gmail.com/ <https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@gmail.com/>
> [2] https://lore.kernel.org/all/SJ0PR11MB5678A864244B09FDE4D914EEC9402@SJ0PR11MB5678.namprd11.prod.outlook.com/ <https://lore.kernel.org/all/SJ0PR11MB5678A864244B09FDE4D914EEC9402@SJ0PR11MB5678.namprd11.prod.outlook.com/>

Thanks Alexandre.

My RFC only had correctness testing so far. I tested the all-zswap path
and fallback cases under QEMU, but I don't have bare-metal
performance numbers yet.

If you are already actively working on this, I don't want to duplicate the
same effort. I will pause this RFC for now and wait for your series.

After your series is posted, I will take another look and see if there is
anything that still needs follow-up work.

Thanks for letting me know.


^ permalink raw reply

* Re: [PATCH mm-unstable v17 05/14] mm/khugepaged: require collapse_huge_page to enter/exit with the lock dropped
From: David Hildenbrand (Arm) @ 2026-05-12  7:42 UTC (permalink / raw)
  To: Nico Pache, linux-doc, linux-kernel, linux-mm, linux-trace-kernel
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
	gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe
In-Reply-To: <20260511185817.686831-6-npache@redhat.com>

On 5/11/26 20:58, Nico Pache wrote:
> Currently the collapse_huge_page function requires the mmap_read_lock to
> enter with it held, and exit with it dropped. This function moves the
> unlock into its parent caller, and changes this semantic to requiring it
> to enter/exit with it always unlocked.
> 
> In future patches, we need this expectation, as for in mTHP collapse, we
> may have already have dropped the lock, and do not want to conditionally
> check for this by passing through the lock_dropped variable.
> 
> No functional change is expected as one of the first things the
> collapse_huge_page function does is drop this lock before allocating the
> hugepage.
> 
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 27465161fa6d..37a5f6791816 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1199,6 +1199,14 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
>  	return SCAN_SUCCEED;
>  }
>  
> +/*
> + * collapse_huge_page expects the mmap_read_lock to be dropped before
> + * entering this function. 

"before entering." Talking about "this function" after naming it sounds odd.

Also, there is only an "mmap_lock".

> The function will also always return with the lock
> + * dropped. 

"collapse_huge_page expects the mmap_lock to be unlocked before entering and
will always return with the lock unlocked."

Or something simple like that?

> The function starts by allocation a folio, which can potentially
> + * take a long time if it involves sync compaction, and we do not need to hold
> + * the mmap_lock during that. We must recheck the vma after taking it again in
> + * write mode.
> + */

"... to avoid holding the mmap_lock while allocating a THP, as that could
trigger direct reclaim/compaction. Note that the VMA must be rechecked after
grabbing the mmap_lock again."

?

Ending up with something like

"collapse_huge_page expects the mmap_lock to be unlocked before entering and
will always return with the lock unlocked, to avoid holding the mmap_lock while
allocating a THP, as that could trigger direct reclaim/compaction.  Note that
the VMA must be rechecked after grabbing the mmap_lock again."

>  static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long address,
>  		int referenced, int unmapped, struct collapse_control *cc)
>  {
> @@ -1214,14 +1222,6 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
>  
>  	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>  
> -	/*
> -	 * Before allocating the hugepage, release the mmap_lock read lock.
> -	 * The allocation can take potentially a long time if it involves
> -	 * sync compaction, and we do not need to hold the mmap_lock during
> -	 * that. We will recheck the vma after taking it again in write mode.
> -	 */
> -	mmap_read_unlock(mm);
> -
>  	result = alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER);
>  	if (result != SCAN_SUCCEED)
>  		goto out_nolock;
> @@ -1526,6 +1526,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
>  out_unmap:
>  	pte_unmap_unlock(pte, ptl);
>  	if (result == SCAN_SUCCEED) {
> +		/* collapse_huge_page expects the lock to be dropped before calling */
> +		mmap_read_unlock(mm);
>  		result = collapse_huge_page(mm, start_addr, referenced,
>  					    unmapped, cc);
>  		/* collapse_huge_page will return with the mmap_lock released */


Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH mm-unstable v17 04/14] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support
From: Lance Yang @ 2026-05-12  7:42 UTC (permalink / raw)
  To: npache
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe
In-Reply-To: <20260511185817.686831-5-npache@redhat.com>


On Mon, May 11, 2026 at 12:58:04PM -0600, Nico Pache wrote:
>generalize the order of the __collapse_huge_page_* and collapse_max_*
>functions to support future mTHP collapse.
>
>The current mechanism for determining collapse with the
>khugepaged_max_ptes_none value is not designed with mTHP in mind. This
>raises a key design issue: if we support user defined max_pte_none values
>(even those scaled by order), a collapse of a lower order can introduces
>an feedback loop, or "creep", when max_ptes_none is set to a value greater
>than HPAGE_PMD_NR / 2. [1]
>
>With this configuration, a successful collapse to order N will populate
>enough pages to satisfy the collapse condition on order N+1 on the next
>scan. This leads to unnecessary work and memory churn.
>
>To fix this issue introduce a helper function that will limit mTHP
>collapse support to two max_ptes_none values, 0 and HPAGE_PMD_NR - 1.
>This effectively supports two modes: [2]
>
>- max_ptes_none=0: never collapses if it encounters an empty PTE or a PTE
>  that maps the shared zeropage. Consequently, no memory bloat.
>- max_ptes_none=511 (on 4k pagesz): Always collapse to the highest
>  available mTHP order.
>
>This removes the possiblilty of "creep", while not modifying any uAPI
>expectations. A warning will be emitted if any non-supported
>max_ptes_none value is configured with mTHP enabled.
>
>mTHP collapse will not honor the khugepaged_max_ptes_shared or
>khugepaged_max_ptes_swap parameters, and will fail if it encounters a
>shared or swapped entry.
>
>No functional changes in this patch; however it defines future behavior
>for mTHP collapse.
>
>[1] - https://lore.kernel.org/all/e46ab3ab-a3d7-4fb7-9970-d0704bd5d05a@arm.com
>[2] - https://lore.kernel.org/all/37375ace-5601-4d6c-9dac-d1c8268698e9@redhat.com
>
>Co-developed-by: Dev Jain <dev.jain@arm.com>
>Signed-off-by: Dev Jain <dev.jain@arm.com>
>Signed-off-by: Nico Pache <npache@redhat.com>
>---
> include/trace/events/huge_memory.h |   3 +-
> mm/khugepaged.c                    | 117 ++++++++++++++++++++---------
> 2 files changed, 85 insertions(+), 35 deletions(-)
>
>diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
>index bcdc57eea270..443e0bd13fdb 100644
>--- a/include/trace/events/huge_memory.h
>+++ b/include/trace/events/huge_memory.h
>@@ -39,7 +39,8 @@
> 	EM( SCAN_STORE_FAILED,		"store_failed")			\
> 	EM( SCAN_COPY_MC,		"copy_poisoned_page")		\
> 	EM( SCAN_PAGE_FILLED,		"page_filled")			\
>-	EMe(SCAN_PAGE_DIRTY_OR_WRITEBACK, "page_dirty_or_writeback")
>+	EM(SCAN_PAGE_DIRTY_OR_WRITEBACK, "page_dirty_or_writeback")	\
>+	EMe(SCAN_INVALID_PTES_NONE,	"invalid_ptes_none")
> 
> #undef EM
> #undef EMe
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index f68853b3caa7..27465161fa6d 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -61,6 +61,7 @@ enum scan_result {
> 	SCAN_COPY_MC,
> 	SCAN_PAGE_FILLED,
> 	SCAN_PAGE_DIRTY_OR_WRITEBACK,
>+	SCAN_INVALID_PTES_NONE,
> };
> 
> #define CREATE_TRACE_POINTS
>@@ -353,37 +354,60 @@ static bool pte_none_or_zero(pte_t pte)
>  * PTEs for the given collapse operation.
>  * @cc: The collapse control struct
>  * @vma: The vma to check for userfaultfd
>+ * @order: The folio order being collapsed to
>  *
>  * Return: Maximum number of none-page or zero-page PTEs allowed for the
>  * collapse operation.
>  */
>-static unsigned int collapse_max_ptes_none(struct collapse_control *cc,
>-		struct vm_area_struct *vma)
>+static int collapse_max_ptes_none(struct collapse_control *cc,
>+		struct vm_area_struct *vma, unsigned int order)
> {
>+	unsigned int max_ptes_none = khugepaged_max_ptes_none;
> 	// If the vma is userfaultfd-armed, allow no none-page or zero-page PTEs.

One thing I still want to call out: kernel code usually uses C-style
comments :)

> 	if (vma && userfaultfd_armed(vma))
> 		return 0;
> 	// for MADV_COLLAPSE, allow any none-page or zero-page PTEs.
> 	if (!cc->is_khugepaged)
> 		return HPAGE_PMD_NR;
>-	// For all other cases repect the user defined maximum.
>-	return khugepaged_max_ptes_none;
>+	// for PMD collapse, respect the user defined maximum.
>+	if (is_pmd_order(order))
>+		return max_ptes_none;
>+	/* Zero/non-present collapse disabled. */
>+	if (!max_ptes_none)
>+		return 0;
>+	// for mTHP collapse with the sysctl value set to KHUGEPAGED_MAX_PTES_LIMIT,
>+	// scale the maximum number of PTEs to the order of the collapse.
>+	if (max_ptes_none == KHUGEPAGED_MAX_PTES_LIMIT)
>+		return (1 << order) - 1;
>+
>+	// We currently only support max_ptes_none values of 0 or KHUGEPAGED_MAX_PTES_LIMIT.
>+	// Emit a warning and return -EINVAL.
>+	pr_warn_once("mTHP collapse only supports max_ptes_none values of 0 or %u\n",
>+		      KHUGEPAGED_MAX_PTES_LIMIT);

Maybe fallback to 0 instead, as David suggested earlier?

max_ptes_none is mostly legacy PMD THP behavior. mTHP is new, and any
intermediate value in (0, KHUGEPAGED_MAX_PTES_LIMIT) would implicitly
disable it :(

Treating those values as 0 feels like the least surprising behavior,
IMHO. It also gives mTHP a cleaner staring point, rather than carry over
all the old PMD knob semantics :)

Otherwise, LGTM!
Reviewed-by: Lance Yang <lance.yang@linux.dev>

>+	return -EINVAL;
> }
> 
> /**
>  * collapse_max_ptes_shared - Calculate maximum allowed PTEs that map shared
>  * anonymous pages for the given collapse operation.
>  * @cc: The collapse control struct
>+ * @order: The folio order being collapsed to
>  *
>  * Return: Maximum number of PTEs that map shared anonymous pages for the
>  * collapse operation
>  */
>-static unsigned int collapse_max_ptes_shared(struct collapse_control *cc)
>+static unsigned int collapse_max_ptes_shared(struct collapse_control *cc,
>+		unsigned int order)
> {
> 	// for MADV_COLLAPSE, do not restrict the number of PTEs that map shared
> 	// anonymous pages.
> 	if (!cc->is_khugepaged)
> 		return HPAGE_PMD_NR;
>+	// for mTHP collapse do not allow collapsing anonymous memory pages that
>+	// are shared between processes.
>+	if (!is_pmd_order(order))
>+		return 0;
>+	// for PMD collapse, respect the user defined maximum.
> 	return khugepaged_max_ptes_shared;
> }
> 
>@@ -391,16 +415,22 @@ static unsigned int collapse_max_ptes_shared(struct collapse_control *cc)
>  * collapse_max_ptes_swap - Calculate the maximum allowed non-present PTEs or the
>  * maximum allowed non-present pagecache entries for the given collapse operation.
>  * @cc: The collapse control struct
>+ * @order: The folio order being collapsed to
>  *
>  * Return: Maximum number of non-present PTEs or the maximum allowed non-present
>  * pagecache entries for the collapse operation.
>  */
>-static unsigned int collapse_max_ptes_swap(struct collapse_control *cc)
>+static unsigned int collapse_max_ptes_swap(struct collapse_control *cc,
>+		unsigned int order)
> {
> 	// for MADV_COLLAPSE, do not restrict the number PTEs entries or
> 	// pagecache entries that are non-present.
> 	if (!cc->is_khugepaged)
> 		return HPAGE_PMD_NR;
>+	// for mTHP collapse do not allow any non-present PTEs or pagecache entries.
>+	if (!is_pmd_order(order))
>+		return 0;
>+	// for PMD collapse, respect the user defined maximum.
> 	return khugepaged_max_ptes_swap;
> }
> 
>@@ -594,18 +624,22 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
> 
> static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
> 		unsigned long start_addr, pte_t *pte, struct collapse_control *cc,
>-		struct list_head *compound_pagelist)
>+		unsigned int order, struct list_head *compound_pagelist)
> {
>+	const unsigned long nr_pages = 1UL << order;
> 	struct page *page = NULL;
> 	struct folio *folio = NULL;
> 	unsigned long addr = start_addr;
> 	pte_t *_pte;
> 	int none_or_zero = 0, shared = 0, referenced = 0;
> 	enum scan_result result = SCAN_FAIL;
>-	unsigned int max_ptes_none = collapse_max_ptes_none(cc, vma);
>-	unsigned int max_ptes_shared = collapse_max_ptes_shared(cc);
>+	int max_ptes_none = collapse_max_ptes_none(cc, vma, order);
>+	unsigned int max_ptes_shared = collapse_max_ptes_shared(cc, order);
>+
>+	if (max_ptes_none < 0)
>+		return SCAN_INVALID_PTES_NONE;
> 
>-	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>+	for (_pte = pte; _pte < pte + nr_pages;
> 	     _pte++, addr += PAGE_SIZE) {
> 		pte_t pteval = ptep_get(_pte);
> 		if (pte_none_or_zero(pteval)) {
>@@ -738,18 +772,18 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
> }
> 
> static void __collapse_huge_page_copy_succeeded(pte_t *pte,
>-						struct vm_area_struct *vma,
>-						unsigned long address,
>-						spinlock_t *ptl,
>-						struct list_head *compound_pagelist)
>+		struct vm_area_struct *vma, unsigned long address,
>+		spinlock_t *ptl, unsigned int order,
>+		struct list_head *compound_pagelist)
> {
>-	unsigned long end = address + HPAGE_PMD_SIZE;
>+	const unsigned long nr_pages = 1UL << order;
>+	unsigned long end = address + (PAGE_SIZE << order);
> 	struct folio *src, *tmp;
> 	pte_t pteval;
> 	pte_t *_pte;
> 	unsigned int nr_ptes;
> 
>-	for (_pte = pte; _pte < pte + HPAGE_PMD_NR; _pte += nr_ptes,
>+	for (_pte = pte; _pte < pte + nr_pages; _pte += nr_ptes,
> 	     address += nr_ptes * PAGE_SIZE) {
> 		nr_ptes = 1;
> 		pteval = ptep_get(_pte);
>@@ -802,11 +836,10 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
> }
> 
> static void __collapse_huge_page_copy_failed(pte_t *pte,
>-					     pmd_t *pmd,
>-					     pmd_t orig_pmd,
>-					     struct vm_area_struct *vma,
>-					     struct list_head *compound_pagelist)
>+		pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma,
>+		unsigned int order, struct list_head *compound_pagelist)
> {
>+	const unsigned long nr_pages = 1UL << order;
> 	spinlock_t *pmd_ptl;
> 
> 	/*
>@@ -822,7 +855,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte,
> 	 * Release both raw and compound pages isolated
> 	 * in __collapse_huge_page_isolate.
> 	 */
>-	release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist);
>+	release_pte_pages(pte, pte + nr_pages, compound_pagelist);
> }
> 
> /*
>@@ -842,16 +875,17 @@ static void __collapse_huge_page_copy_failed(pte_t *pte,
>  */
> static enum scan_result __collapse_huge_page_copy(pte_t *pte, struct folio *folio,
> 		pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma,
>-		unsigned long address, spinlock_t *ptl,
>+		unsigned long address, spinlock_t *ptl, unsigned int order,
> 		struct list_head *compound_pagelist)
> {
>+	const unsigned long nr_pages = 1UL << order;
> 	unsigned int i;
> 	enum scan_result result = SCAN_SUCCEED;
> 
> 	/*
> 	 * Copying pages' contents is subject to memory poison at any iteration.
> 	 */
>-	for (i = 0; i < HPAGE_PMD_NR; i++) {
>+	for (i = 0; i < nr_pages; i++) {
> 		pte_t pteval = ptep_get(pte + i);
> 		struct page *page = folio_page(folio, i);
> 		unsigned long src_addr = address + i * PAGE_SIZE;
>@@ -870,10 +904,10 @@ static enum scan_result __collapse_huge_page_copy(pte_t *pte, struct folio *foli
> 
> 	if (likely(result == SCAN_SUCCEED))
> 		__collapse_huge_page_copy_succeeded(pte, vma, address, ptl,
>-						    compound_pagelist);
>+						    order, compound_pagelist);
> 	else
> 		__collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma,
>-						 compound_pagelist);
>+						 order, compound_pagelist);
> 
> 	return result;
> }
>@@ -1044,12 +1078,12 @@ static enum scan_result check_pmd_still_valid(struct mm_struct *mm,
>  * Returns result: if not SCAN_SUCCEED, mmap_lock has been released.
>  */
> static enum scan_result __collapse_huge_page_swapin(struct mm_struct *mm,
>-		struct vm_area_struct *vma, unsigned long start_addr, pmd_t *pmd,
>-		int referenced)
>+		struct vm_area_struct *vma, unsigned long start_addr,
>+		pmd_t *pmd, int referenced, unsigned int order)
> {
> 	int swapped_in = 0;
> 	vm_fault_t ret = 0;
>-	unsigned long addr, end = start_addr + (HPAGE_PMD_NR * PAGE_SIZE);
>+	unsigned long addr, end = start_addr + (PAGE_SIZE << order);
> 	enum scan_result result;
> 	pte_t *pte = NULL;
> 	spinlock_t *ptl;
>@@ -1081,6 +1115,19 @@ static enum scan_result __collapse_huge_page_swapin(struct mm_struct *mm,
> 		    pte_present(vmf.orig_pte))
> 			continue;
> 
>+		/*
>+		 * TODO: Support swapin without leading to further mTHP
>+		 * collapses. Currently bringing in new pages via swapin may
>+		 * cause a future higher order collapse on a rescan of the same
>+		 * range.
>+		 */
>+		if (!is_pmd_order(order)) {
>+			pte_unmap(pte);
>+			mmap_read_unlock(mm);
>+			result = SCAN_EXCEED_SWAP_PTE;
>+			goto out;
>+		}
>+
> 		vmf.pte = pte;
> 		vmf.ptl = ptl;
> 		ret = do_swap_page(&vmf);
>@@ -1200,7 +1247,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> 		 * that case.  Continuing to collapse causes inconsistency.
> 		 */
> 		result = __collapse_huge_page_swapin(mm, vma, address, pmd,
>-						     referenced);
>+						     referenced, HPAGE_PMD_ORDER);
> 		if (result != SCAN_SUCCEED)
> 			goto out_nolock;
> 	}
>@@ -1248,6 +1295,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> 	pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl);
> 	if (pte) {
> 		result = __collapse_huge_page_isolate(vma, address, pte, cc,
>+						      HPAGE_PMD_ORDER,
> 						      &compound_pagelist);
> 		spin_unlock(pte_ptl);
> 	} else {
>@@ -1278,6 +1326,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> 
> 	result = __collapse_huge_page_copy(pte, folio, pmd, _pmd,
> 					   vma, address, pte_ptl,
>+					   HPAGE_PMD_ORDER,
> 					   &compound_pagelist);
> 	pte_unmap(pte);
> 	if (unlikely(result != SCAN_SUCCEED))
>@@ -1313,9 +1362,9 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> 		struct vm_area_struct *vma, unsigned long start_addr,
> 		bool *lock_dropped, struct collapse_control *cc)
> {
>-	const unsigned int max_ptes_none = collapse_max_ptes_none(cc, vma);
>-	const unsigned int max_ptes_shared = collapse_max_ptes_shared(cc);
>-	const unsigned int max_ptes_swap = collapse_max_ptes_swap(cc);
>+	const int max_ptes_none = collapse_max_ptes_none(cc, vma, HPAGE_PMD_ORDER);
>+	const unsigned int max_ptes_shared = collapse_max_ptes_shared(cc, HPAGE_PMD_ORDER);
>+	const unsigned int max_ptes_swap = collapse_max_ptes_swap(cc, HPAGE_PMD_ORDER);
> 	pmd_t *pmd;
> 	pte_t *pte, *_pte;
> 	int none_or_zero = 0, shared = 0, referenced = 0;
>@@ -2369,8 +2418,8 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
> 		unsigned long addr, struct file *file, pgoff_t start,
> 		struct collapse_control *cc)
> {
>-	const unsigned int max_ptes_none = collapse_max_ptes_none(cc, NULL);
>-	const unsigned int max_ptes_swap = collapse_max_ptes_swap(cc);
>+	const int max_ptes_none = collapse_max_ptes_none(cc, NULL, HPAGE_PMD_ORDER);
>+	const unsigned int max_ptes_swap = collapse_max_ptes_swap(cc, HPAGE_PMD_ORDER);
> 	struct folio *folio = NULL;
> 	struct address_space *mapping = file->f_mapping;
> 	XA_STATE(xas, &mapping->i_pages, start);
>-- 
>2.54.0
>
>

^ permalink raw reply

* Re: [PATCH 12/12] swap: move swap_info_struct to mm/swap.h
From: Damien Le Moal @ 2026-05-12  7:41 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-13-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> swap_info_struct is now internal to the MM subsystem, so remove it from
> the public header.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 11/12] swap: move struct swap_extent to swapfile.c
From: Damien Le Moal @ 2026-05-12  7:36 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-12-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> struct swap_extent is only used inside of mm/swapfile.c, so move it
> there.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 2/2] scripts: checkpatch.pl: add warning for strlcat()
From: Manuel Ebner @ 2026-05-12  7:36 UTC (permalink / raw)
  To: David Laight, Jonathan Corbet
  Cc: andy.shevchenko, apw, dwaipayanray1, joe, kees, linux-doc,
	linux-kernel, lukas.bulwahn, skhan, workflows
In-Reply-To: <20260511142745.7757b1b2@pumpkin>

On Mon, 2026-05-11 at 14:27 +0100, David Laight wrote:
> On Mon, 11 May 2026 06:12:36 -0600
> Jonathan Corbet <corbet@lwn.net> wrote:
> 
> > Manuel Ebner <manuelebner@mailbox.org> writes:
> > 
> > > add a warning for strlcat()
> > > 
> > > Signed-off-by: Manuel Ebner <manuelebner@mailbox.org>
> > > ---
> > >  scripts/checkpatch.pl | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index 0492d6afc9a1..ca1a8e67d529 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -7085,6 +7085,12 @@ sub process {
> > >  			     "Prefer strscpy over strlcpy - see:
> > > https://github.com/KSPP/linux/issues/89\n" . $herecurr);

Here you can see the external urls already deployed. there are two more in 
the code blocks above.

> > >  		}
> > >  
> > > +# strlcat uses that should likely be
> > > +		if ($line =~ /\bstrlcat\s*\(/ && !is_userspace($realfile)) {
> > > +			WARN("STRLCAT",
> > > +			     "Prefer seq_buf_printf() over strlcat - see:
> > > https://github.com/KSPP/linux/issues/370\n" . $herecurr);
> > > +		}  
> > 
> > Using seq_buf_printf() requires switching over to the seq_buf API in
> > general, it is not just a simple substitution, so this advice may prove
> > unhelpful to many.
> 
> And I'm not sure the external url is a good idea.

It wasn't my idea originally. but I'm open to suggestions.

Manuel

> > 
> > jon
> > 

^ permalink raw reply

* Re: [PATCH 10/12] swap: add a swap_activate_fs_ops helper
From: Damien Le Moal @ 2026-05-12  7:36 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-11-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> Add a helper abstracting away the low-level details of enabling
> fs_ops-based swapping.  This prepares for taking swap_info_struct
> private.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate
From: Damien Le Moal @ 2026-05-12  7:34 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-10-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> Only the file operation method knows what block device we'll swap
> to.  So move down setting sis->bdev and the special blockdev flag
> into ->swap_activate.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

One nit below. Otherwise, looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> @@ -141,7 +141,6 @@ int generic_swap_activate(struct file *swap_file, struct swap_info_struct *sis)
>  		continue;
>  	}
>  	return 0;
> -

whiteline change.

>  bad_bmap:
>  	pr_err("swapon: swapfile has holes\n");
>  	return -EINVAL;
-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 08/12] swap,iomap: simplify iomap_swapfile_iter
From: Damien Le Moal @ 2026-05-12  7:31 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-9-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> add_swap_extent already coalesces multiple extents, no need to duplicate
> that in the caller.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH mm-unstable v17 03/14] mm/khugepaged: rework max_ptes_* handling with helper functions
From: David Hildenbrand (Arm) @ 2026-05-12  7:29 UTC (permalink / raw)
  To: Nico Pache, linux-doc, linux-kernel, linux-mm, linux-trace-kernel
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
	gourry, hannes, hughd, jack, jackmanb, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, liam, ljs, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe, Usama Arif
In-Reply-To: <20260511185817.686831-4-npache@redhat.com>

On 5/11/26 20:58, Nico Pache wrote:
> The following cleanup reworks all the max_ptes_* handling into helper
> functions. This increases the code readability and will later be used to
> implement the mTHP handling of these variables.
> 
> With these changes we abstract all the madvise_collapse() special casing
> (dont respect the sysctls) away from the functions that utilize them. And
> will be used later in this series to cleanly restrict the mTHP collapse
> behavior.
> 
> No functional change is intended; however, we are now only reading the
> sysfs variables once per scan, whereas before these variables were being
> read on each loop iteration.
> 
> Suggested-by: David Hildenbrand <david@kernel.org>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>

Some nits when re-reading:

> Acked-by: Usama Arif <usama.arif@linux.dev>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 118 +++++++++++++++++++++++++++++++++---------------
>  1 file changed, 82 insertions(+), 36 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index f0e29d5c7b1f..f68853b3caa7 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -348,6 +348,62 @@ static bool pte_none_or_zero(pte_t pte)
>  	return pte_present(pte) && is_zero_pfn(pte_pfn(pte));
>  }
>  
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed none-page or zero-page

I know, it's painful, but ...

There is no "none-page".

Calculate maximum allowed empty PTEs or PTEs mapping the shared zeropage ... ?

> + * PTEs for the given collapse operation.

We usually indent here (second line of subject), I think. Same applies to the
other doc below.

> + * @cc: The collapse control struct
> + * @vma: The vma to check for userfaultfd
> + *
> + * Return: Maximum number of none-page or zero-page PTEs allowed for the
> + * collapse operation.

Same here.

> + */
> +static unsigned int collapse_max_ptes_none(struct collapse_control *cc,
> +		struct vm_area_struct *vma)
> +{
> +	// If the vma is userfaultfd-armed, allow no none-page or zero-page PTEs.

Lance commented on the comment style.

Is this comment really required? It's pretty self-documenting already.

> +	if (vma && userfaultfd_armed(vma))
> +		return 0;
> +	// for MADV_COLLAPSE, allow any none-page or zero-page PTEs.
> +	if (!cc->is_khugepaged)
> +		return HPAGE_PMD_NR;
> +	// For all other cases repect the user defined maximum.
> +	return khugepaged_max_ptes_none;
> +}
> +
-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH 07/12] swap,block: limit swap file size to device size
From: Christoph Hellwig @ 2026-05-12  7:23 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song,
	Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <217f91d9-4d5f-47a6-ad20-0404968b8e08@kernel.org>

On Tue, May 12, 2026 at 04:21:47PM +0900, Damien Le Moal wrote:
> On 5/12/26 14:35, Christoph Hellwig wrote:
> > Don't blindly pass the value from the swap header to swap_add_extent,
> > but instead the device size rounded down to page granularity.  This
> > activated the sanity checking in the core code that catches a too large
> > value in the swap header.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Looks OK to me, though this maybe could be folded in the previous patch ?

I prefer to keep behavior changes as isolated as possible.


^ permalink raw reply

* Re: [PATCH v13 3/4] gpio: rpmsg: add generic rpmsg GPIO driver
From: Arnaud POULIQUEN @ 2026-05-12  7:22 UTC (permalink / raw)
  To: Andrew Lunn, Mathieu Poirier
  Cc: tanmay.shah, Beleswar Prasad Padhi, Shenwei Wang, Linus Walleij,
	Bartosz Golaszewski, Jonathan Corbet, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Frank Li,
	Sascha Hauer, Shuah Khan, linux-gpio@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Pengutronix Kernel Team, Fabio Estevam, Peng Fan,
	devicetree@vger.kernel.org, linux-remoteproc@vger.kernel.org,
	imx@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	dl-linux-imx, Bartosz Golaszewski
In-Reply-To: <4ae35920-2539-4b12-8dea-efd407b8aaeb@lunn.ch>

Hello Andrew,

On 5/11/26 20:18, Andrew Lunn wrote:
>> Arnaud, Beleswar, Andrew and I are all advocating for one endpoint per
>> GPIO controller.  The remaining issue it about the best way to work
>> out source and destination addresses between Linux and the remote
>> processor.  I'm running out of time for today but I'll return to this
>> thread with a final analysis by the end of the week.
> 
> How many of the participants here will be in Minneapolis next week for
> the Embedded Linux Conference? There is even a talk about this:
> 
> https://osselcna2026.sched.com/event/2JQpx/building-virtual-drivers-with-rpmsg-key-design-principles-challenges-trade-offs-beleswar-prasad-padhi-texas-instruments?iframe=yes&w=100%&sidebar=yes&bg=no
> 
> Maybe we can get together and decide on the final design after the
> session.

I won’t be there, but I can join remotely if a call is scheduled,

Regards,
Arnaud

> 
> 	Andrew


^ permalink raw reply

* Re: [PATCH 07/12] swap,block: limit swap file size to device size
From: Damien Le Moal @ 2026-05-12  7:21 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-8-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> Don't blindly pass the value from the swap header to swap_add_extent,
> but instead the device size rounded down to page granularity.  This
> activated the sanity checking in the core code that catches a too large
> value in the swap header.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me, though this maybe could be folded in the previous patch ?

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 01/12] swap: remove the maxpages variable in sys_swapon
From: Christoph Hellwig @ 2026-05-12  7:20 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song,
	Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <03dddf72-8755-4ebf-ba79-456377f0f25d@kernel.org>

On Tue, May 12, 2026 at 04:08:35PM +0900, Damien Le Moal wrote:
> On 5/12/26 14:35, Christoph Hellwig wrote:
> > Always use si->max which is updated setup_swap_extents instead of copying
> > into and out of maxpages.
> 
> Checking mm/swapfile.c, I see s->max being set only in swapon(). Is this a typo
> or am I misunderstanding this sentence ?

It is updated by the file system methods or the generic implementation
called by setup_swap_extents currently.  So the above is a bit imprecise.

The next patch then removes this confusing update.


^ permalink raw reply

* Re: [PATCH 06/12] swap,block: move the block device swapon code into block/fops.c
From: Damien Le Moal @ 2026-05-12  7:20 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-7-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> Make use of the abstractions we have.  This is a preparation for
> moving more special casing down into block/.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 05/12] swap: cleanup setup_swap_extents
From: Damien Le Moal @ 2026-05-12  7:18 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-6-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> Reflow setup_swap_extents so that the flag checking is not conditional on
> a swap_activate method.  This is currently a no-op because the swapoff
> code still checks the presence of a swap_deactivate method, but it
> simplifies adding a new check, and also makes the SWP_ACTIVATED flag
> more consistent.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 04/12] swap: restrict to regular files or block devices
From: Damien Le Moal @ 2026-05-12  7:17 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-5-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> Various swap code assumes it runs either on a block device or on a
> regular file.  Make this restriction explicit using checks right
> after opening the file.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 03/12] swap,fs: move swapfile operations to struct file_operations
From: Damien Le Moal @ 2026-05-12  7:16 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song
  Cc: Christian Brauner, Darrick J . Wong, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-4-hch@lst.de>

On 5/12/26 14:35, Christoph Hellwig wrote:
> The swap operations have nothing to do with the address_space, which is
> used for pagecache operations.  Move them to struct file_operations
> instead.  This will allow moving the block device special cases into
> block/fops.c subsequently.
> 
> Pass struct file first to ->swap_activate as file operations typically
> get the file or iocb as first argument and use swap_activate instead of
> swapfile_activate in all names to be consistent.
> 
> Note that while the trivial iomap wrappers are moved to a new file when
> applicable to keep them local to the file operation instances, complex
> implementation are kept in their existing place.  It might be worth to
> move them in follow-on patches if the maintainers desire so.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox