Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* Re: [PATCH 02/61] btrfs: Prefer IS_ERR_OR_NULL over manual NULL check
From: David Sterba @ 2026-03-13 19:22 UTC (permalink / raw)
  To: Philipp Hahn
  Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
	gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
	linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
	linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
	linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
	linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
	linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
	linux-sctp, linux-security-module, linux-sh, linux-sound,
	linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
	netdev, ntfs3, samba-technical, sched-ext, target-devel,
	tipc-discussion, v9fs, Chris Mason, David Sterba
In-Reply-To: <20260310-b4-is_err_or_null-v1-2-bd63b656022d@avm.de>

On Tue, Mar 10, 2026 at 12:48:28PM +0100, Philipp Hahn wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
> 
> IS_ERR_OR_NULL() already uses likely(!ptr) internally. checkpatch does
> not like nesting it:
> > WARNING: nested (un)?likely() calls, IS_ERR_OR_NULL already uses
> > unlikely() internally
> Remove the explicit use of likely().
> 
> Change generated with coccinelle.
> 
> To: Chris Mason <clm@fb.com>
> To: David Sterba <dsterba@suse.com>
> Cc: linux-btrfs@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>

Added to for-next, we seem to be using IS_ERR_OR_NULL() already in a
few other places so this is makes sense for consistency. Thanks.

^ permalink raw reply

* Re: [PATCH] lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
From: Konstantin Ryabitsev @ 2026-03-13 18:41 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, Yury Norov, Rasmus Villemoes, Eric Biggers,
	Jason A. Donenfeld, Ard Biesheuvel, linux-kernel, kexec,
	linux-cifs, linux-spi, linux-hyperv, K. Y. Srinivasan,
	Haiyang Zhang, Jason Gunthorpe, Leon Romanovsky, Mark Brown,
	Steve French, Alexander Graf, Mike Rapoport, Pasha Tatashin
In-Reply-To: <abRN59ST3uRDS5-e@yury>

On Fri, Mar 13, 2026 at 01:48:23PM -0400, Yury Norov wrote:
> (Thanks for b4!)

\o/

> Interesting thread.
> 
> So, my workflow is:
> 
>  1. git format-patch --cover-letter
>  2. # Edit cover letter, add To and CC section
>  3. git send-email 000* --to-cover --cc-cover
>  4. b4 am 
>  5. # Address nits/typos in the mbox
>  6. git am 
>  7. # Address substantial comments in git
>  8. git format-patch -v2 --cover-letter
>  9. # Edit cover letter again to restore body, To and CC sections
> 10. git send-email v2-000* --to-cover --cc-cover

This is doing it the classic way.

> So, yes I loose recipients on every iteration, together with the whole
> cover letter. But to me it's not a big deal because I can just pull
> them from my mailbox.
> 
> In the better world, I'd like to have:
> git send-email -v2 000* --to-the-same-people-as-in-v1
> 
> In the perfect world, I'd prefer to keep the cover letter under the
> git control, once it created, together with the recipients, once they
> are added, and be able to edit them just like regular commits.
> 
> There's a 'git am -k', which is seemingly related to the matter, and
> it keeps the [PATCH] prefix. But it's not what can preserve recipients
> for me.
> 
> I'll try b4 prep and trailers as suggested.

Yes, it was created to simplify this process significantly. It's still mostly
git and email, but at least you won't have to do quite so many manual steps.

Regards,
-- 
KR

^ permalink raw reply

* Re: [PATCH] lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
From: Yury Norov @ 2026-03-13 18:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes, Eric Biggers,
	Jason A. Donenfeld, Ard Biesheuvel, linux-kernel, kexec,
	linux-cifs, linux-spi, linux-hyperv, K. Y. Srinivasan,
	Haiyang Zhang, Leon Romanovsky, Mark Brown, Steve French,
	Alexander Graf, Mike Rapoport, Pasha Tatashin
In-Reply-To: <20260313171855.GA1744604@nvidia.com>

On Fri, Mar 13, 2026 at 02:18:55PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2026 at 07:08:16PM -0400, Yury Norov wrote:
> > Based on 'sizeof(x) == 4' condition, in 32-bit case the function is wired
> > to ffs(), while in 64-bit case to __ffs(). The difference is substantial:
> > ffs(x) == __ffs(x) + 1. Also, ffs(0) == 0, while __ffs(0) is undefined.
> > 
> > The 32-bit behaviour is inconsistent with the function description, so it
> > needs to get fixed.
> > 
> > There are 9 individual users for the function in 6 different subsystems.
> > Some arches and drivers are 64-bit only:
> >  - arch/loongarch/kvm/intc/eiointc.c;
> >  - drivers/hv/mshv_vtl_main.c;
> >  - kernel/liveupdate/kexec_handover.c;
> > 
> > The others are:
> >  - ib_umem_find_best_pgsz(): as per comment, __ffs() should be correct;
> 
> So long as 32 bit works the same as 64 bit it is correct for ib

This is what the patch does, except that it doesn't account for the
word length. In you case, 'mask' is dma_addr_t, which is u32 or u64
depending ARCH_DMA_ADDR_T_64BIT.

This config is:

        config ARCH_DMA_ADDR_T_64BIT
                def_bool 64BIT || PHYS_ADDR_T_64BIT

And PHYS_ADDR_T_64BIT is simply def_bool 64BIT. So, at least now
dma_addr_t simply follows unsigned long, and thus, the patch is
correct. But IDK what's the history behind this configurations.

Anyways, the patch aligns 32-bit count_trailing_zeros() with the
64-bit one. If you OK with that, as you said, can you please send
an explicit ack?

Thanks,
Yury

^ permalink raw reply

* Re: [PATCH] lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
From: Yury Norov @ 2026-03-13 17:48 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Yury Norov, Rasmus Villemoes, Eric Biggers, Jason A. Donenfeld,
	Ard Biesheuvel, linux-kernel, kexec, linux-cifs, linux-spi,
	linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Jason Gunthorpe,
	Leon Romanovsky, Mark Brown, Steve French, Alexander Graf,
	Mike Rapoport, Konstantin Ryabitsev, Pasha Tatashin
In-Reply-To: <abPdItJ152oMzGd6@ashevche-desk.local>

On Fri, Mar 13, 2026 at 11:47:14AM +0200, Andy Shevchenko wrote:
> On Thu, Mar 12, 2026 at 07:08:16PM -0400, Yury Norov wrote:
> > Based on 'sizeof(x) == 4' condition, in 32-bit case the function is wired
> > to ffs(), while in 64-bit case to __ffs(). The difference is substantial:
> > ffs(x) == __ffs(x) + 1. Also, ffs(0) == 0, while __ffs(0) is undefined.
> > 
> > The 32-bit behaviour is inconsistent with the function description, so it
> > needs to get fixed.
> > 
> > There are 9 individual users for the function in 6 different subsystems.
> > Some arches and drivers are 64-bit only:
> >  - arch/loongarch/kvm/intc/eiointc.c;
> >  - drivers/hv/mshv_vtl_main.c;
> >  - kernel/liveupdate/kexec_handover.c;
> > 
> > The others are:
> >  - ib_umem_find_best_pgsz(): as per comment, __ffs() should be correct;
> >  - rzv2m_csi_reg_write_bit(): ARCH_RENESAS only, unclear;
> >  - lz77_match_len(): CIFS_COMPRESSION only, unclear, experimental;
> > 
> > None of them explicitly tweak their code for a word length, or x == 0.
> > 
> > Requesting comments from the corresponding maintainers on how to proceed
> > with this.
> > 
> > The attached patch gets rid of 32-bit explicit support, so that both
> > 32- and 64-bit versions rely on __ffs().
> 
> > CC: "K. Y. Srinivasan" <kys@microsoft.com> (hyperv)
> > CC: Haiyang Zhang <haiyangz@microsoft.com> (hyperv)
> > CC: Jason Gunthorpe <jgg@ziepe.ca> (infiniband)
> > CC: Leon Romanovsky <leon@kernel.org> (infiniband)
> > CC: Mark Brown <broonie@kernel.org> (spi)
> > CC: Steve French <sfrench@samba.org> (smb)
> > CC: Alexander Graf <graf@amazon.com> (kexec)
> > CC: Mike Rapoport <rppt@kernel.org> (kexec)
> > CC: Pasha Tatashin <pasha.tatashin@soleen.com> (kexec)
> 
> Please, move the Cc: list to the...
> 
> > Signed-off-by: Yury Norov <ynorov@nvidia.com>
> > ---
> 
> ...comments block. It will have the same effect on emails, but drastically
> reduces unneeded noise in the commit message in the Git history.

In general case, I agree. In this particular case, I want CCs to be in the
main block, and eventually got replaced with the ACKs from the proper
maintainers.
 
> You may also read this subthread (patch 18) on how to handle it locally:
> https://lore.kernel.org/linux-iio/20260123113708.416727-19-bigeasy@linutronix.de/

+ Konstantin Ryabitsev <mricon@kernel.org>

(Thanks for b4!)

Interesting thread.

So, my workflow is:

 1. git format-patch --cover-letter
 2. # Edit cover letter, add To and CC section
 3. git send-email 000* --to-cover --cc-cover
 4. b4 am 
 5. # Address nits/typos in the mbox
 6. git am 
 7. # Address substantial comments in git
 8. git format-patch -v2 --cover-letter
 9. # Edit cover letter again to restore body, To and CC sections
10. git send-email v2-000* --to-cover --cc-cover

So, yes I loose recipients on every iteration, together with the whole
cover letter. But to me it's not a big deal because I can just pull
them from my mailbox.

In the better world, I'd like to have:
git send-email -v2 000* --to-the-same-people-as-in-v1

In the perfect world, I'd prefer to keep the cover letter under the
git control, once it created, together with the recipients, once they
are added, and be able to edit them just like regular commits.

There's a 'git am -k', which is seemingly related to the matter, and
it keeps the [PATCH] prefix. But it's not what can preserve recipients
for me.

I'll try b4 prep and trailers as suggested.

Thanks,
Yury

> >  include/linux/count_zeros.h | 9 +++------
> 
> ...
> 
> > +#define COUNT_TRAILING_ZEROS_0 (-1)
> 
> Shouldn't we also saturate this to BITS_PER_LONG?
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 

^ permalink raw reply

* Re: [PATCH] lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
From: Jason Gunthorpe @ 2026-03-13 17:18 UTC (permalink / raw)
  To: Yury Norov
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes, Eric Biggers,
	Jason A. Donenfeld, Ard Biesheuvel, linux-kernel, kexec,
	linux-cifs, linux-spi, linux-hyperv, K. Y. Srinivasan,
	Haiyang Zhang, Leon Romanovsky, Mark Brown, Steve French,
	Alexander Graf, Mike Rapoport, Pasha Tatashin
In-Reply-To: <20260312230817.372878-1-ynorov@nvidia.com>

On Thu, Mar 12, 2026 at 07:08:16PM -0400, Yury Norov wrote:
> Based on 'sizeof(x) == 4' condition, in 32-bit case the function is wired
> to ffs(), while in 64-bit case to __ffs(). The difference is substantial:
> ffs(x) == __ffs(x) + 1. Also, ffs(0) == 0, while __ffs(0) is undefined.
> 
> The 32-bit behaviour is inconsistent with the function description, so it
> needs to get fixed.
> 
> There are 9 individual users for the function in 6 different subsystems.
> Some arches and drivers are 64-bit only:
>  - arch/loongarch/kvm/intc/eiointc.c;
>  - drivers/hv/mshv_vtl_main.c;
>  - kernel/liveupdate/kexec_handover.c;
> 
> The others are:
>  - ib_umem_find_best_pgsz(): as per comment, __ffs() should be correct;

So long as 32 bit works the same as 64 bit it is correct for ib

Jason

^ permalink raw reply

* Re: [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
From: Jason Gunthorpe @ 2026-03-13 16:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Haiyang Zhang,
	K . Y . Srinivasan, Wei Liu, Dexuan Cui, Simon Horman, netdev,
	linux-rdma, linux-hyperv, linux-kernel
In-Reply-To: <20260307173814.GN12611@unreal>

On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote:
> On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> > When the MANA hardware undergoes a service reset, the ETH auxiliary device
> > (mana.eth) used by DPDK persists across the reset cycle — it is not removed
> > and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such
> > as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ,
> > QP and MR resources have become stale.
> 
> NAK to any of this.
> 
> In case of hardware reset, mana_ib AUX device needs to be destroyed and
> recreated later.

Yeah, that is our general model for any serious RAS event where the
driver's view of resources becomes out of sync with the HW.

You have tear down the ib_device by removing the aux and then bring
back a new one.

There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event is to
tell userspace to close and re-open their uverbs FD.

We don't have a model where a uverbs FD in userspace can continue to
work after the device has a catasrophic RAS event.

There may be room to have a model where the ib device doesn't fully
unplug/replug so it retains its name and things, but that is core code
not driver stuff.

Jason

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: Update maintainers for Hyper-V DRM driver
From: Easwar Hariharan @ 2026-03-13 16:48 UTC (permalink / raw)
  To: Saurabh Sengar
  Cc: linux-kernel, linux-hyperv, wei.liu, easwar.hariharan, decui,
	longli, drawat.floss, ssengar
In-Reply-To: <20260313042148.1021099-1-ssengar@linux.microsoft.com>

On 3/12/2026 9:21 PM, Saurabh Sengar wrote:
> Add myself, Dexuana, and Long as maintainers. Deepak is stepping down
> from these responsibilities.

Minor typo in Dexuan's name. Probably something Wei can fix when he picks up.

- Easwar

> 
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
>  MAINTAINERS | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6358dd7f1632..d67afcb0acc3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8028,7 +8028,9 @@ F:	Documentation/devicetree/bindings/display/himax,hx8357.yaml
>  F:	drivers/gpu/drm/tiny/hx8357d.c
>  
>  DRM DRIVER FOR HYPERV SYNTHETIC VIDEO DEVICE
> -M:	Deepak Rawat <drawat.floss@gmail.com>
> +M:	Dexuan Cui <decui@microsoft.com>
> +M:	Long Li <longli@microsoft.com>
> +M:	Saurabh Sengar <ssengar@linux.microsoft.com>
>  L:	linux-hyperv@vger.kernel.org
>  L:	dri-devel@lists.freedesktop.org
>  S:	Maintained


^ permalink raw reply

* Re: [PATCH] lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
From: Yury Norov @ 2026-03-13 16:31 UTC (permalink / raw)
  To: Enzo Matsumiya
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes, Eric Biggers,
	Jason A. Donenfeld, Ard Biesheuvel, linux-kernel, kexec,
	linux-cifs, linux-spi, linux-hyperv, K. Y. Srinivasan,
	Haiyang Zhang, Jason Gunthorpe, Leon Romanovsky, Mark Brown,
	Steve French, Alexander Graf, Mike Rapoport, Pasha Tatashin
In-Reply-To: <atnytehtvt6h6kp2ndxsa3x257usezp3bk5hp4ch7gf5w2zake@omihp5zbio3l>

On Thu, Mar 12, 2026 at 08:54:29PM -0300, Enzo Matsumiya wrote:
> Hi Yury,
> 
> On 03/12, Yury Norov wrote:
> > Based on 'sizeof(x) == 4' condition, in 32-bit case the function is wired
> > to ffs(), while in 64-bit case to __ffs(). The difference is substantial:
> > ffs(x) == __ffs(x) + 1. Also, ffs(0) == 0, while __ffs(0) is undefined.
> > 
> > The 32-bit behaviour is inconsistent with the function description, so it
> > needs to get fixed.
> > 
> > There are 9 individual users for the function in 6 different subsystems.
> > Some arches and drivers are 64-bit only:
> > - arch/loongarch/kvm/intc/eiointc.c;
> > - drivers/hv/mshv_vtl_main.c;
> > - kernel/liveupdate/kexec_handover.c;
> > 
> > The others are:
> > - ib_umem_find_best_pgsz(): as per comment, __ffs() should be correct;
> > - rzv2m_csi_reg_write_bit(): ARCH_RENESAS only, unclear;
> > - lz77_match_len(): CIFS_COMPRESSION only, unclear, experimental;
> > 
> > None of them explicitly tweak their code for a word length, or x == 0.
> 
> Context for lz77_match_len() case:
> 
> 	const u64 diff = lz77_read64(cur) ^ lz77_read64(wnd);
> 
> 	if (!diff) {
> 	...
> 	}
> 
> 	cur += count_trailing_zeros(diff) >> 3;
> 
> So x == 0 is checked, however it does assume that
> sizeof(unsigned long) == sizeof(u64).  I'll have to fix it for when
> that's not the case (even with your patch in, as __ffs() casts x to
> unsigned long down the line).  Thanks for the heads up.
 
Yes, in your case you need __ffs64() (which doesn't exist). Or simply
leverage bitmaps API:

        DECLARE_BITMAP(bitmap, 64);

        ...

        bitmap_from_u64(bitmap, diff);
        cur += find_first_bit(bitmap, 64) >> 3;
> 
> Cheers,
> 
> Enzo

So, if you're not objecting to wire 32-bit count_trailing_zeros() to
__ffs(), can you please send your ack?

Thanks,
Yury

^ permalink raw reply

* Re: [PATCH net,v2] net: mana: fix use-after-free in mana_hwc_destroy_channel() by reordering teardown
From: Simon Horman @ 2026-03-13 13:40 UTC (permalink / raw)
  To: Dipayaan Roy
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, dipayanroy
In-Reply-To: <abHA3AjNtqa1nx9k@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>

On Wed, Mar 11, 2026 at 12:22:04PM -0700, Dipayaan Roy wrote:
> A potential race condition exists in mana_hwc_destroy_channel() where
> hwc->caller_ctx is freed before the HWC's Completion Queue (CQ) and
> Event Queue (EQ) are destroyed. This allows an in-flight CQ interrupt
> handler to dereference freed memory, leading to a use-after-free or
> NULL pointer dereference in mana_hwc_handle_resp().
> 
> mana_smc_teardown_hwc() signals the hardware to stop but does not
> synchronize against IRQ handlers already executing on other CPUs. The
> IRQ synchronization only happens in mana_hwc_destroy_cq() via
> mana_gd_destroy_eq() -> mana_gd_deregister_irq(). Since this runs
> after kfree(hwc->caller_ctx), a concurrent mana_hwc_rx_event_handler()
> can dereference freed caller_ctx (and rxq->msg_buf) in
> mana_hwc_handle_resp().
> 
> Fix this by reordering teardown to reverse-of-creation order: destroy
> the TX/RX work queues and CQ/EQ before freeing hwc->caller_ctx. This
> ensures all in-flight interrupt handlers complete before the memory they
> access is freed.
> 
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> ---
> Changes in v2:
>   - Added maintainers missed in v1.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
From: Lorenzo Stoakes (Oracle) @ 2026-03-13 12:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
	Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
	Alexandre Torgue, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
	David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
	Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <20260312141425.1837736829210f2d0b00cac6@linux-foundation.org>

On Thu, Mar 12, 2026 at 02:14:25PM -0700, Andrew Morton wrote:
> On Thu, 12 Mar 2026 20:27:16 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > +int mmap_action_prepare(struct vm_area_desc *desc,
> > +			struct mmap_action *action)
> > +
> >  {
> >  	switch (action->type) {
> >  	case MMAP_NOTHING:
> > -		break;
> > +		return 0;
> >  	case MMAP_REMAP_PFN:
> > -		remap_pfn_range_prepare(desc, action->remap.start_pfn);
> > -		break;
> > +		return remap_pfn_range_prepare(desc, action);
> >  	case MMAP_IO_REMAP_PFN:
> > -		io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
> > -					   action->remap.size);
> > -		break;
> > +		return io_remap_pfn_range_prepare(desc, action);
> >  	}
> >  }
> >  EXPORT_SYMBOL(mmap_action_prepare);
>
> hm, was this the correct version?
>
> mm/util.c: In function 'mmap_action_prepare':
> mm/util.c:1451:1: error: control reaches end of non-void function [-Werror=return-type]
>  1451 | }

Seems different compiler versions do different things :)

In theory we should never hit that but memory corruption and err... rogue
drivers? could cause it ofc :)

Will fix on respin.

>
> --- a/mm/util.c~mm-various-small-mmap_prepare-cleanups-fix
> +++ a/mm/util.c
> @@ -1356,6 +1356,8 @@ int mmap_action_prepare(struct vm_area_d
>  		return remap_pfn_range_prepare(desc, action);
>  	case MMAP_IO_REMAP_PFN:
>  		return io_remap_pfn_range_prepare(desc, action);
> +	default:
> +		BUG();

I'd probably prefer a WARN_ON_ONCE(1) return -EBLAH; will think about it on
respin.

>  	}
>  }
>  EXPORT_SYMBOL(mmap_action_prepare);
> _
>

Cheers, Lorenzo

^ permalink raw reply

* Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
From: Lorenzo Stoakes (Oracle) @ 2026-03-13 12:00 UTC (permalink / raw)
  To: Usama Arif
  Cc: Andrew Morton, Clemens Ladisch, Arnd Bergmann, Greg Kroah-Hartman,
	K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Alexander Shishkin, Maxime Coquelin, Alexandre Torgue,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Bodo Stroesser, Martin K . Petersen, David Howells, Marc Dionne,
	Alexander Viro, Christian Brauner, Jan Kara, David Hildenbrand,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Pedro Falcato,
	linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <20260313110745.2573005-1-usama.arif@linux.dev>

On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > the deprecated mmap callback.
> >
> > However, it did not account for the fact that mmap_prepare can fail to map
> > due to an out of memory error, and thus should not be incrementing a
> > reference count on mmap_prepare.
> >
> > With the newly added vm_ops->mapped callback available, we can simply defer
> > this operation to that callback which is only invoked once the mapping is
> > successfully in place (but not yet visible to userspace as the mmap and VMA
> > write locks are held).
> >
> > Therefore add afs_mapped() to implement this callback for AFS.
> >
> > In practice the mapping allocations are 'too small to fail' so this is
> > something that realistically should never happen in practice (or would do
> > so in a case where the process is about to die anyway), but we should still
> > handle this.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> >  fs/afs/file.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > index f609366fd2ac..69ef86f5e274 100644
> > --- a/fs/afs/file.c
> > +++ b/fs/afs/file.c
> > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> >  static void afs_vm_open(struct vm_area_struct *area);
> >  static void afs_vm_close(struct vm_area_struct *area);
> >  static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > +		      const struct file *file, void **vm_private_data);
> >
> >  const struct file_operations afs_file_operations = {
> >  	.open		= afs_open,
> > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> >  };
> >
> >  static const struct vm_operations_struct afs_vm_ops = {
> > +	.mapped		= afs_mapped,
> >  	.open		= afs_vm_open,
> >  	.close		= afs_vm_close,
> >  	.fault		= filemap_fault,
> > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> >  	afs_add_open_mmap(vnode);
>
> Is the above afs_add_open_mmap an additional one, which could cause a reference
> leak? Does the above one need to be removed and only the one in afs_mapped()
> needs to be kept?

Ah yeah good spot, will fix thanks!

>
> >
> >  	ret = generic_file_mmap_prepare(desc);
> > -	if (ret == 0)
> > -		desc->vm_ops = &afs_vm_ops;
> > -	else
> > -		afs_drop_open_mmap(vnode);
> > +	if (ret)
> > +		return ret;
> > +
> > +	desc->vm_ops = &afs_vm_ops;
> >  	return ret;
> >  }
> >
> > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > +		      const struct file *file, void **vm_private_data)
> > +{
> > +	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > +
> > +	afs_add_open_mmap(vnode);
> > +	return 0;
> > +}
> > +
> >  static void afs_vm_open(struct vm_area_struct *vma)
> >  {
> >  	afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > --
> > 2.53.0
> >
> >

Cheers, Lorenzo

^ permalink raw reply

* Re: [PATCH 04/15] mm: add vm_ops->mapped hook
From: Lorenzo Stoakes (Oracle) @ 2026-03-13 11:58 UTC (permalink / raw)
  To: Usama Arif
  Cc: Andrew Morton, Clemens Ladisch, Arnd Bergmann, Greg Kroah-Hartman,
	K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Alexander Shishkin, Maxime Coquelin, Alexandre Torgue,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Bodo Stroesser, Martin K . Petersen, David Howells, Marc Dionne,
	Alexander Viro, Christian Brauner, Jan Kara, David Hildenbrand,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Pedro Falcato,
	linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <20260313110238.2500603-1-usama.arif@linux.dev>

On Fri, Mar 13, 2026 at 04:02:36AM -0700, Usama Arif wrote:
> On Thu, 12 Mar 2026 20:27:19 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > Previously, when a driver needed to do something like establish a reference
> > count, it could do so in the mmap hook in the knowledge that the mapping
> > would succeed.
> >
> > With the introduction of f_op->mmap_prepare this is no longer the case, as
> > it is invoked prior to actually establishing the mapping.
> >
> > To take this into account, introduce a new vm_ops->mapped callback which is
> > invoked when the VMA is first mapped (though notably - not when it is
> > merged - which is correct and mirrors existing mmap/open/close behaviour).
> >
> > We do better that vm_ops->open() here, as this callback can return an
> > error, at which point the VMA will be unmapped.
> >
> > Note that vm_ops->mapped() is invoked after any mmap action is
> > complete (such as I/O remapping).
> >
> > We intentionally do not expose the VMA at this point, exposing only the
> > fields that could be used, and an output parameter in case the operation
> > needs to update the vma->vm_private_data field.
> >
> > In order to deal with stacked filesystems which invoke inner filesystem's
> > mmap() invocations, add __compat_vma_mapped() and invoke it on
> > vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
> > handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
> > callback.
> >
> > We can now also remove call_action_complete() and invoke
> > mmap_action_complete() directly, as we separate out the rmap lock logic to
> > be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
> >
> > We also abstract unmapping of a VMA on mmap action completion into its own
> > helper function, unmap_vma_locked().
> >
> > Additionally, update VMA userland test headers to reflect the change.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> >  include/linux/fs.h              |  9 +++-
> >  include/linux/mm.h              | 17 +++++++
> >  mm/internal.h                   | 10 ++++
> >  mm/util.c                       | 86 ++++++++++++++++++++++++---------
> >  mm/vma.c                        | 41 +++++++++++-----
> >  tools/testing/vma/include/dup.h | 34 ++++++++++++-
> >  6 files changed, 158 insertions(+), 39 deletions(-)
> >
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index a2628a12bd2b..c390f5c667e3 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
> >  }
> >
> >  int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> > +int __vma_check_mmap_hook(struct vm_area_struct *vma);
> >
> >  static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
> >  {
> > +	int err;
> > +
> >  	if (file->f_op->mmap_prepare)
> >  		return compat_vma_mmap(file, vma);
> >
> > -	return file->f_op->mmap(file, vma);
> > +	err = file->f_op->mmap(file, vma);
> > +	if (err)
> > +		return err;
> > +
> > +	return __vma_check_mmap_hook(vma);
> >  }
> >
> >  static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 12a0b4c63736..7333d5db1221 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -759,6 +759,23 @@ struct vm_operations_struct {
> >  	 * Context: User context.  May sleep.  Caller holds mmap_lock.
> >  	 */
> >  	void (*close)(struct vm_area_struct *vma);
> > +	/**
> > +	 * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > +	 * the new VMA is merged with an adjacent VMA.
> > +	 *
> > +	 * The @vm_private_data field is an output field allowing the user to
> > +	 * modify vma->vm_private_data as necessary.
> > +	 *
> > +	 * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > +	 * set from f_op->mmap.
> > +	 *
> > +	 * Returns %0 on success, or an error otherwise. On error, the VMA will
> > +	 * be unmapped.
> > +	 *
> > +	 * Context: User context.  May sleep.  Caller holds mmap_lock.
> > +	 */
> > +	int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > +		      const struct file *file, void **vm_private_data);
> >  	/* Called any time before splitting to check if it's allowed */
> >  	int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
> >  	int (*mremap)(struct vm_area_struct *vma);
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 7bfa85b5e78b..f0f2cf1caa36 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
> >   * mmap hook and safely handle error conditions. On error, VMA hooks will be
> >   * mutated.
> >   *
> > + * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
> > + *
> >   * @file: File which backs the mapping.
> >   * @vma:  VMA which we are mapping.
> >   *
> > @@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
> >  /* unmap_vmas is in mm/memory.c */
> >  void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
> >
> > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > +{
> > +	const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > +
> > +	mmap_assert_locked(vma->vm_mm);
> > +	do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > +}
> > +
> >  #ifdef CONFIG_MMU
> >
> >  static inline void get_anon_vma(struct anon_vma *anon_vma)
> > diff --git a/mm/util.c b/mm/util.c
> > index dba1191725b6..2b0ed54008d6 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
> >  EXPORT_SYMBOL(flush_dcache_folio);
> >  #endif
> >
> > +static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +	struct vm_area_desc desc = {
> > +		.mm = vma->vm_mm,
> > +		.file = file,
> > +		.start = vma->vm_start,
> > +		.end = vma->vm_end,
> > +
> > +		.pgoff = vma->vm_pgoff,
> > +		.vm_file = vma->vm_file,
> > +		.vma_flags = vma->flags,
> > +		.page_prot = vma->vm_page_prot,
> > +
> > +		.action.type = MMAP_NOTHING, /* Default */
> > +	};
> > +	int err;
> > +
> > +	err = vfs_mmap_prepare(file, &desc);
> > +	if (err)
> > +		return err;
> > +
> > +	err = mmap_action_prepare(&desc, &desc.action);
> > +	if (err)
> > +		return err;
> > +
> > +	set_vma_from_desc(vma, &desc);
> > +	return mmap_action_complete(vma, &desc.action);
> > +}
> > +
> > +static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
> > +{
> > +	const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > +	void *vm_private_data = vma->vm_private_data;
> > +	int err;
> > +
> > +	if (!vm_ops->mapped)
> > +		return 0;
> > +
>
> Hello!
>
> Can vm_ops be NULL here?  __compat_vma_mapped() is called from
> compat_vma_mmap(), which is reached when a filesystem provides
> mmap_prepare.  If the mmap_prepare hook does not set desc->vm_ops,
> vma->vm_ops will be NULL and this dereferences a NULL pointer.

I _think_ for this to ever be invoked, you would need to be dealing with a
file-backed VMA so vm_ops->fault would HAVE to be defined.

But you're right anyway as a matter of principle we should check it! Will fix.

>
> For e.g. drivers/char/mem.c, mmap_zero_prepare() would trigger
> a NULL pointer dereference here.
>
> Would need to do
> 	if (!vm_ops || !vm_ops->mapped)
> 		return 0;
>
> here

Yes.

>
>
> > +	err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
> > +			     &vm_private_data);
> > +	if (err)
> > +		unmap_vma_locked(vma);
>
> when mapped() returns an error, unmap_vma_locked(vma) is called
> but execution continues into the vm_private_data update below.  After
> unmap_vma_locked() the VMA may be freed (do_munmap can remove the VMA
> entirely), so accessing vma->vm_private_data after that is a
> use-after-free.

Very good point :) will fix thanks!

Probably:

	if (err)
		unmap_vma_locked(vma);
	else if (vm_private_data != vma->vm_private_data)
		vma->vm_private_data = vm_private_data;

	return err;

Would be fine.

>
> Probably need to do:
> 	if (err) {
> 		unmap_vma_locked(vma);
> 		return err;
> 	}
>
> > +	/* Update private data if changed. */
> > +	if (vm_private_data != vma->vm_private_data)
> > +		vma->vm_private_data = vm_private_data;
> > +
> > +	return err;
> > +}
> > +
> >  /**
> >   * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
> >   * existing VMA and execute any requested actions.
> > @@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
> >   */
> >  int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> >  {
> > -	struct vm_area_desc desc = {
> > -		.mm = vma->vm_mm,
> > -		.file = file,
> > -		.start = vma->vm_start,
> > -		.end = vma->vm_end,
> > -
> > -		.pgoff = vma->vm_pgoff,
> > -		.vm_file = vma->vm_file,
> > -		.vma_flags = vma->flags,
> > -		.page_prot = vma->vm_page_prot,
> > -
> > -		.action.type = MMAP_NOTHING, /* Default */
> > -	};
> >  	int err;
> >
> > -	err = vfs_mmap_prepare(file, &desc);
> > -	if (err)
> > -		return err;
> > -
> > -	err = mmap_action_prepare(&desc, &desc.action);
> > +	err = __compat_vma_mmap(file, vma);
> >  	if (err)
> >  		return err;
> >
> > -	set_vma_from_desc(vma, &desc);
> > -	return mmap_action_complete(vma, &desc.action);
> > +	return __compat_vma_mapped(file, vma);
> >  }
> >  EXPORT_SYMBOL(compat_vma_mmap);
> >
> > +int __vma_check_mmap_hook(struct vm_area_struct *vma)
> > +{
> > +	/* vm_ops->mapped is not valid if mmap() is specified. */
> > +	if (WARN_ON_ONCE(vma->vm_ops->mapped))
> > +		return -EINVAL;
>
> I think vma->vm_ops can be NULL here. Should be:
>
> 	if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped))
> 		return -EINVAL;

I think again you'd probably only invoke this on file-backed so be ok, but again
as a matter of principle we should check it so will fix, thanks!

>
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(__vma_check_mmap_hook);
> > +
> >  static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
> >  			 const struct page *page)
> >  {
> > @@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
> >  	 * invoked if we do NOT merge, so we only clean up the VMA we created.
> >  	 */
> >  	if (err) {
> > -		const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > -
> > -		do_munmap(current->mm, vma->vm_start, len, NULL);
> > -
> > +		unmap_vma_locked(vma);
> >  		if (action->error_hook) {
> >  			/* We may want to filter the error. */
> >  			err = action->error_hook(err);
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 054cf1d262fb..ef9f5a5365d1 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
> >  	return false;
> >  }
> >
> > -static int call_action_complete(struct mmap_state *map,
> > -				struct mmap_action *action,
> > -				struct vm_area_struct *vma)
> > +static int call_mapped_hook(struct vm_area_struct *vma)
> >  {
> > -	int ret;
> > +	const struct vm_operations_struct *vm_ops = vma->vm_ops;
> > +	void *vm_private_data = vma->vm_private_data;
> > +	int err;
> >
> > -	ret = mmap_action_complete(vma, action);
> > +	if (!vm_ops || !vm_ops->mapped)
> > +		return 0;
> > +	err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
> > +			     vma->vm_file, &vm_private_data);
> > +	if (err) {
> > +		unmap_vma_locked(vma);
> > +		return err;
> > +	}
> > +	/* Update private data if changed. */
> > +	if (vm_private_data != vma->vm_private_data)
> > +		vma->vm_private_data = vm_private_data;
> > +	return 0;
> > +}
> >
> > -	/* If we held the file rmap we need to release it. */
> > -	if (map->hold_file_rmap_lock) {
> > -		struct file *file = vma->vm_file;
> > +static void maybe_drop_file_rmap_lock(struct mmap_state *map,
> > +				      struct vm_area_struct *vma)
> > +{
> > +	struct file *file;
> >
> > -		i_mmap_unlock_write(file->f_mapping);
> > -	}
> > -	return ret;
> > +	if (!map->hold_file_rmap_lock)
> > +		return;
> > +	file = vma->vm_file;
> > +	i_mmap_unlock_write(file->f_mapping);
> >  }
> >
> >  static unsigned long __mmap_region(struct file *file, unsigned long addr,
> > @@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> >  	__mmap_complete(&map, vma);
> >
> >  	if (have_mmap_prepare && allocated_new) {
> > -		error = call_action_complete(&map, &desc.action, vma);
> > +		error = mmap_action_complete(vma, &desc.action);
> > +		if (!error)
> > +			error = call_mapped_hook(vma);
> >
> > +		maybe_drop_file_rmap_lock(&map, vma);
> >  		if (error)
> >  			return error;
> >  	}
> > diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> > index 908beb263307..47d8db809f31 100644
> > --- a/tools/testing/vma/include/dup.h
> > +++ b/tools/testing/vma/include/dup.h
> > @@ -606,12 +606,34 @@ struct vm_area_struct {
> >  } __randomize_layout;
> >
> >  struct vm_operations_struct {
> > -	void (*open)(struct vm_area_struct * area);
> > +	/**
> > +	 * @open: Called when a VMA is remapped or split. Not called upon first
> > +	 * mapping a VMA.
> > +	 * Context: User context.  May sleep.  Caller holds mmap_lock.
> > +	 */
> > +	void (*open)(struct vm_area_struct *vma);
> >  	/**
> >  	 * @close: Called when the VMA is being removed from the MM.
> >  	 * Context: User context.  May sleep.  Caller holds mmap_lock.
> >  	 */
> > -	void (*close)(struct vm_area_struct * area);
> > +	void (*close)(struct vm_area_struct *vma);
> > +	/**
> > +	 * @mapped: Called when the VMA is first mapped in the MM. Not called if
> > +	 * the new VMA is merged with an adjacent VMA.
> > +	 *
> > +	 * The @vm_private_data field is an output field allowing the user to
> > +	 * modify vma->vm_private_data as necessary.
> > +	 *
> > +	 * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> > +	 * set from f_op->mmap.
> > +	 *
> > +	 * Returns %0 on success, or an error otherwise. On error, the VMA will
> > +	 * be unmapped.
> > +	 *
> > +	 * Context: User context.  May sleep.  Caller holds mmap_lock.
> > +	 */
> > +	int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> > +		      const struct file *file, void **vm_private_data);
> >  	/* Called any time before splitting to check if it's allowed */
> >  	int (*may_split)(struct vm_area_struct *area, unsigned long addr);
> >  	int (*mremap)(struct vm_area_struct *area);
> > @@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
> >  	swap(vma->vm_file, file);
> >  	fput(file);
> >  }
> > +
> > +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> > +{
> > +	const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > +
> > +	mmap_assert_locked(vma->vm_mm);
> > +	do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> > +}
> > --
> > 2.53.0
> >
> >

Cheers, Lorenzo

^ permalink raw reply

* Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure
From: Usama Arif @ 2026-03-13 11:07 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
	Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
	Alexandre Torgue, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
	David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
	Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <4a5fa45119220b9d99ed72a36308aed01a30d2c1.1773346620.git.ljs@kernel.org>

On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> the deprecated mmap callback.
> 
> However, it did not account for the fact that mmap_prepare can fail to map
> due to an out of memory error, and thus should not be incrementing a
> reference count on mmap_prepare.
> 
> With the newly added vm_ops->mapped callback available, we can simply defer
> this operation to that callback which is only invoked once the mapping is
> successfully in place (but not yet visible to userspace as the mmap and VMA
> write locks are held).
> 
> Therefore add afs_mapped() to implement this callback for AFS.
> 
> In practice the mapping allocations are 'too small to fail' so this is
> something that realistically should never happen in practice (or would do
> so in a case where the process is about to die anyway), but we should still
> handle this.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
>  fs/afs/file.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index f609366fd2ac..69ef86f5e274 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
>  static void afs_vm_open(struct vm_area_struct *area);
>  static void afs_vm_close(struct vm_area_struct *area);
>  static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> +		      const struct file *file, void **vm_private_data);
>  
>  const struct file_operations afs_file_operations = {
>  	.open		= afs_open,
> @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
>  };
>  
>  static const struct vm_operations_struct afs_vm_ops = {
> +	.mapped		= afs_mapped,
>  	.open		= afs_vm_open,
>  	.close		= afs_vm_close,
>  	.fault		= filemap_fault,
> @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
>  	afs_add_open_mmap(vnode);

Is the above afs_add_open_mmap an additional one, which could cause a reference
leak? Does the above one need to be removed and only the one in afs_mapped()
needs to be kept?

>  
>  	ret = generic_file_mmap_prepare(desc);
> -	if (ret == 0)
> -		desc->vm_ops = &afs_vm_ops;
> -	else
> -		afs_drop_open_mmap(vnode);
> +	if (ret)
> +		return ret;
> +
> +	desc->vm_ops = &afs_vm_ops;
>  	return ret;
>  }
>  
> +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> +		      const struct file *file, void **vm_private_data)
> +{
> +	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> +
> +	afs_add_open_mmap(vnode);
> +	return 0;
> +}
> +
>  static void afs_vm_open(struct vm_area_struct *vma)
>  {
>  	afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> -- 
> 2.53.0
> 
> 

^ permalink raw reply

* Re: [PATCH 04/15] mm: add vm_ops->mapped hook
From: Usama Arif @ 2026-03-13 11:02 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Usama Arif, Andrew Morton, Clemens Ladisch, Arnd Bergmann,
	Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
	Alexandre Torgue, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
	David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
	Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <0e0fe47852e6009f662b1fa42f836447b8d1283a.1773346620.git.ljs@kernel.org>

On Thu, 12 Mar 2026 20:27:19 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> Previously, when a driver needed to do something like establish a reference
> count, it could do so in the mmap hook in the knowledge that the mapping
> would succeed.
> 
> With the introduction of f_op->mmap_prepare this is no longer the case, as
> it is invoked prior to actually establishing the mapping.
> 
> To take this into account, introduce a new vm_ops->mapped callback which is
> invoked when the VMA is first mapped (though notably - not when it is
> merged - which is correct and mirrors existing mmap/open/close behaviour).
> 
> We do better that vm_ops->open() here, as this callback can return an
> error, at which point the VMA will be unmapped.
> 
> Note that vm_ops->mapped() is invoked after any mmap action is
> complete (such as I/O remapping).
> 
> We intentionally do not expose the VMA at this point, exposing only the
> fields that could be used, and an output parameter in case the operation
> needs to update the vma->vm_private_data field.
> 
> In order to deal with stacked filesystems which invoke inner filesystem's
> mmap() invocations, add __compat_vma_mapped() and invoke it on
> vfs_mmap() (via compat_vma_mmap()) to ensure that the mapped callback is
> handled when an mmap() caller invokes a nested filesystem's mmap_prepare()
> callback.
> 
> We can now also remove call_action_complete() and invoke
> mmap_action_complete() directly, as we separate out the rmap lock logic to
> be called in __mmap_region() instead via maybe_drop_file_rmap_lock().
> 
> We also abstract unmapping of a VMA on mmap action completion into its own
> helper function, unmap_vma_locked().
> 
> Additionally, update VMA userland test headers to reflect the change.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
>  include/linux/fs.h              |  9 +++-
>  include/linux/mm.h              | 17 +++++++
>  mm/internal.h                   | 10 ++++
>  mm/util.c                       | 86 ++++++++++++++++++++++++---------
>  mm/vma.c                        | 41 +++++++++++-----
>  tools/testing/vma/include/dup.h | 34 ++++++++++++-
>  6 files changed, 158 insertions(+), 39 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a2628a12bd2b..c390f5c667e3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2059,13 +2059,20 @@ static inline bool can_mmap_file(struct file *file)
>  }
>  
>  int compat_vma_mmap(struct file *file, struct vm_area_struct *vma);
> +int __vma_check_mmap_hook(struct vm_area_struct *vma);
>  
>  static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma)
>  {
> +	int err;
> +
>  	if (file->f_op->mmap_prepare)
>  		return compat_vma_mmap(file, vma);
>  
> -	return file->f_op->mmap(file, vma);
> +	err = file->f_op->mmap(file, vma);
> +	if (err)
> +		return err;
> +
> +	return __vma_check_mmap_hook(vma);
>  }
>  
>  static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 12a0b4c63736..7333d5db1221 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -759,6 +759,23 @@ struct vm_operations_struct {
>  	 * Context: User context.  May sleep.  Caller holds mmap_lock.
>  	 */
>  	void (*close)(struct vm_area_struct *vma);
> +	/**
> +	 * @mapped: Called when the VMA is first mapped in the MM. Not called if
> +	 * the new VMA is merged with an adjacent VMA.
> +	 *
> +	 * The @vm_private_data field is an output field allowing the user to
> +	 * modify vma->vm_private_data as necessary.
> +	 *
> +	 * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> +	 * set from f_op->mmap.
> +	 *
> +	 * Returns %0 on success, or an error otherwise. On error, the VMA will
> +	 * be unmapped.
> +	 *
> +	 * Context: User context.  May sleep.  Caller holds mmap_lock.
> +	 */
> +	int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> +		      const struct file *file, void **vm_private_data);
>  	/* Called any time before splitting to check if it's allowed */
>  	int (*may_split)(struct vm_area_struct *vma, unsigned long addr);
>  	int (*mremap)(struct vm_area_struct *vma);
> diff --git a/mm/internal.h b/mm/internal.h
> index 7bfa85b5e78b..f0f2cf1caa36 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -158,6 +158,8 @@ static inline void *folio_raw_mapping(const struct folio *folio)
>   * mmap hook and safely handle error conditions. On error, VMA hooks will be
>   * mutated.
>   *
> + * IMPORTANT: f_op->mmap() is deprecated, prefer f_op->mmap_prepare().
> + *
>   * @file: File which backs the mapping.
>   * @vma:  VMA which we are mapping.
>   *
> @@ -201,6 +203,14 @@ static inline void vma_close(struct vm_area_struct *vma)
>  /* unmap_vmas is in mm/memory.c */
>  void unmap_vmas(struct mmu_gather *tlb, struct unmap_desc *unmap);
>  
> +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> +{
> +	const size_t len = vma_pages(vma) << PAGE_SHIFT;
> +
> +	mmap_assert_locked(vma->vm_mm);
> +	do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> +}
> +
>  #ifdef CONFIG_MMU
>  
>  static inline void get_anon_vma(struct anon_vma *anon_vma)
> diff --git a/mm/util.c b/mm/util.c
> index dba1191725b6..2b0ed54008d6 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1163,6 +1163,55 @@ void flush_dcache_folio(struct folio *folio)
>  EXPORT_SYMBOL(flush_dcache_folio);
>  #endif
>  
> +static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct vm_area_desc desc = {
> +		.mm = vma->vm_mm,
> +		.file = file,
> +		.start = vma->vm_start,
> +		.end = vma->vm_end,
> +
> +		.pgoff = vma->vm_pgoff,
> +		.vm_file = vma->vm_file,
> +		.vma_flags = vma->flags,
> +		.page_prot = vma->vm_page_prot,
> +
> +		.action.type = MMAP_NOTHING, /* Default */
> +	};
> +	int err;
> +
> +	err = vfs_mmap_prepare(file, &desc);
> +	if (err)
> +		return err;
> +
> +	err = mmap_action_prepare(&desc, &desc.action);
> +	if (err)
> +		return err;
> +
> +	set_vma_from_desc(vma, &desc);
> +	return mmap_action_complete(vma, &desc.action);
> +}
> +
> +static int __compat_vma_mapped(struct file *file, struct vm_area_struct *vma)
> +{
> +	const struct vm_operations_struct *vm_ops = vma->vm_ops;
> +	void *vm_private_data = vma->vm_private_data;
> +	int err;
> +
> +	if (!vm_ops->mapped)
> +		return 0;
> +

Hello!

Can vm_ops be NULL here?  __compat_vma_mapped() is called from
compat_vma_mmap(), which is reached when a filesystem provides
mmap_prepare.  If the mmap_prepare hook does not set desc->vm_ops,
vma->vm_ops will be NULL and this dereferences a NULL pointer.

For e.g. drivers/char/mem.c, mmap_zero_prepare() would trigger
a NULL pointer dereference here.

Would need to do
	if (!vm_ops || !vm_ops->mapped)
		return 0;

here


> +	err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, file,
> +			     &vm_private_data);
> +	if (err)
> +		unmap_vma_locked(vma);

when mapped() returns an error, unmap_vma_locked(vma) is called
but execution continues into the vm_private_data update below.  After
unmap_vma_locked() the VMA may be freed (do_munmap can remove the VMA
entirely), so accessing vma->vm_private_data after that is a
use-after-free.

Probably need to do:
	if (err) {
		unmap_vma_locked(vma);
		return err;
	}

> +	/* Update private data if changed. */
> +	if (vm_private_data != vma->vm_private_data)
> +		vma->vm_private_data = vm_private_data;
> +
> +	return err;
> +}
> +
>  /**
>   * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an
>   * existing VMA and execute any requested actions.
> @@ -1191,34 +1240,26 @@ EXPORT_SYMBOL(flush_dcache_folio);
>   */
>  int compat_vma_mmap(struct file *file, struct vm_area_struct *vma)
>  {
> -	struct vm_area_desc desc = {
> -		.mm = vma->vm_mm,
> -		.file = file,
> -		.start = vma->vm_start,
> -		.end = vma->vm_end,
> -
> -		.pgoff = vma->vm_pgoff,
> -		.vm_file = vma->vm_file,
> -		.vma_flags = vma->flags,
> -		.page_prot = vma->vm_page_prot,
> -
> -		.action.type = MMAP_NOTHING, /* Default */
> -	};
>  	int err;
>  
> -	err = vfs_mmap_prepare(file, &desc);
> -	if (err)
> -		return err;
> -
> -	err = mmap_action_prepare(&desc, &desc.action);
> +	err = __compat_vma_mmap(file, vma);
>  	if (err)
>  		return err;
>  
> -	set_vma_from_desc(vma, &desc);
> -	return mmap_action_complete(vma, &desc.action);
> +	return __compat_vma_mapped(file, vma);
>  }
>  EXPORT_SYMBOL(compat_vma_mmap);
>  
> +int __vma_check_mmap_hook(struct vm_area_struct *vma)
> +{
> +	/* vm_ops->mapped is not valid if mmap() is specified. */
> +	if (WARN_ON_ONCE(vma->vm_ops->mapped))
> +		return -EINVAL;

I think vma->vm_ops can be NULL here. Should be:

	if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped))
		return -EINVAL;

> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(__vma_check_mmap_hook);
> +
>  static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio,
>  			 const struct page *page)
>  {
> @@ -1316,10 +1357,7 @@ static int mmap_action_finish(struct vm_area_struct *vma,
>  	 * invoked if we do NOT merge, so we only clean up the VMA we created.
>  	 */
>  	if (err) {
> -		const size_t len = vma_pages(vma) << PAGE_SHIFT;
> -
> -		do_munmap(current->mm, vma->vm_start, len, NULL);
> -
> +		unmap_vma_locked(vma);
>  		if (action->error_hook) {
>  			/* We may want to filter the error. */
>  			err = action->error_hook(err);
> diff --git a/mm/vma.c b/mm/vma.c
> index 054cf1d262fb..ef9f5a5365d1 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2705,21 +2705,35 @@ static bool can_set_ksm_flags_early(struct mmap_state *map)
>  	return false;
>  }
>  
> -static int call_action_complete(struct mmap_state *map,
> -				struct mmap_action *action,
> -				struct vm_area_struct *vma)
> +static int call_mapped_hook(struct vm_area_struct *vma)
>  {
> -	int ret;
> +	const struct vm_operations_struct *vm_ops = vma->vm_ops;
> +	void *vm_private_data = vma->vm_private_data;
> +	int err;
>  
> -	ret = mmap_action_complete(vma, action);
> +	if (!vm_ops || !vm_ops->mapped)
> +		return 0;
> +	err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff,
> +			     vma->vm_file, &vm_private_data);
> +	if (err) {
> +		unmap_vma_locked(vma);
> +		return err;
> +	}
> +	/* Update private data if changed. */
> +	if (vm_private_data != vma->vm_private_data)
> +		vma->vm_private_data = vm_private_data;
> +	return 0;
> +}
>  
> -	/* If we held the file rmap we need to release it. */
> -	if (map->hold_file_rmap_lock) {
> -		struct file *file = vma->vm_file;
> +static void maybe_drop_file_rmap_lock(struct mmap_state *map,
> +				      struct vm_area_struct *vma)
> +{
> +	struct file *file;
>  
> -		i_mmap_unlock_write(file->f_mapping);
> -	}
> -	return ret;
> +	if (!map->hold_file_rmap_lock)
> +		return;
> +	file = vma->vm_file;
> +	i_mmap_unlock_write(file->f_mapping);
>  }
>  
>  static unsigned long __mmap_region(struct file *file, unsigned long addr,
> @@ -2773,8 +2787,11 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  	__mmap_complete(&map, vma);
>  
>  	if (have_mmap_prepare && allocated_new) {
> -		error = call_action_complete(&map, &desc.action, vma);
> +		error = mmap_action_complete(vma, &desc.action);
> +		if (!error)
> +			error = call_mapped_hook(vma);
>  
> +		maybe_drop_file_rmap_lock(&map, vma);
>  		if (error)
>  			return error;
>  	}
> diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
> index 908beb263307..47d8db809f31 100644
> --- a/tools/testing/vma/include/dup.h
> +++ b/tools/testing/vma/include/dup.h
> @@ -606,12 +606,34 @@ struct vm_area_struct {
>  } __randomize_layout;
>  
>  struct vm_operations_struct {
> -	void (*open)(struct vm_area_struct * area);
> +	/**
> +	 * @open: Called when a VMA is remapped or split. Not called upon first
> +	 * mapping a VMA.
> +	 * Context: User context.  May sleep.  Caller holds mmap_lock.
> +	 */
> +	void (*open)(struct vm_area_struct *vma);
>  	/**
>  	 * @close: Called when the VMA is being removed from the MM.
>  	 * Context: User context.  May sleep.  Caller holds mmap_lock.
>  	 */
> -	void (*close)(struct vm_area_struct * area);
> +	void (*close)(struct vm_area_struct *vma);
> +	/**
> +	 * @mapped: Called when the VMA is first mapped in the MM. Not called if
> +	 * the new VMA is merged with an adjacent VMA.
> +	 *
> +	 * The @vm_private_data field is an output field allowing the user to
> +	 * modify vma->vm_private_data as necessary.
> +	 *
> +	 * ONLY valid if set from f_op->mmap_prepare. Will result in an error if
> +	 * set from f_op->mmap.
> +	 *
> +	 * Returns %0 on success, or an error otherwise. On error, the VMA will
> +	 * be unmapped.
> +	 *
> +	 * Context: User context.  May sleep.  Caller holds mmap_lock.
> +	 */
> +	int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff,
> +		      const struct file *file, void **vm_private_data);
>  	/* Called any time before splitting to check if it's allowed */
>  	int (*may_split)(struct vm_area_struct *area, unsigned long addr);
>  	int (*mremap)(struct vm_area_struct *area);
> @@ -1345,3 +1367,11 @@ static inline void vma_set_file(struct vm_area_struct *vma, struct file *file)
>  	swap(vma->vm_file, file);
>  	fput(file);
>  }
> +
> +static inline void unmap_vma_locked(struct vm_area_struct *vma)
> +{
> +	const size_t len = vma_pages(vma) << PAGE_SHIFT;
> +
> +	mmap_assert_locked(vma->vm_mm);
> +	do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> +}
> -- 
> 2.53.0
> 
> 

^ permalink raw reply

* Re: [PATCH] lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
From: Andy Shevchenko @ 2026-03-13  9:47 UTC (permalink / raw)
  To: Yury Norov
  Cc: Yury Norov, Rasmus Villemoes, Eric Biggers, Jason A. Donenfeld,
	Ard Biesheuvel, linux-kernel, kexec, linux-cifs, linux-spi,
	linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Jason Gunthorpe,
	Leon Romanovsky, Mark Brown, Steve French, Alexander Graf,
	Mike Rapoport, Pasha Tatashin
In-Reply-To: <20260312230817.372878-1-ynorov@nvidia.com>

On Thu, Mar 12, 2026 at 07:08:16PM -0400, Yury Norov wrote:
> Based on 'sizeof(x) == 4' condition, in 32-bit case the function is wired
> to ffs(), while in 64-bit case to __ffs(). The difference is substantial:
> ffs(x) == __ffs(x) + 1. Also, ffs(0) == 0, while __ffs(0) is undefined.
> 
> The 32-bit behaviour is inconsistent with the function description, so it
> needs to get fixed.
> 
> There are 9 individual users for the function in 6 different subsystems.
> Some arches and drivers are 64-bit only:
>  - arch/loongarch/kvm/intc/eiointc.c;
>  - drivers/hv/mshv_vtl_main.c;
>  - kernel/liveupdate/kexec_handover.c;
> 
> The others are:
>  - ib_umem_find_best_pgsz(): as per comment, __ffs() should be correct;
>  - rzv2m_csi_reg_write_bit(): ARCH_RENESAS only, unclear;
>  - lz77_match_len(): CIFS_COMPRESSION only, unclear, experimental;
> 
> None of them explicitly tweak their code for a word length, or x == 0.
> 
> Requesting comments from the corresponding maintainers on how to proceed
> with this.
> 
> The attached patch gets rid of 32-bit explicit support, so that both
> 32- and 64-bit versions rely on __ffs().

> CC: "K. Y. Srinivasan" <kys@microsoft.com> (hyperv)
> CC: Haiyang Zhang <haiyangz@microsoft.com> (hyperv)
> CC: Jason Gunthorpe <jgg@ziepe.ca> (infiniband)
> CC: Leon Romanovsky <leon@kernel.org> (infiniband)
> CC: Mark Brown <broonie@kernel.org> (spi)
> CC: Steve French <sfrench@samba.org> (smb)
> CC: Alexander Graf <graf@amazon.com> (kexec)
> CC: Mike Rapoport <rppt@kernel.org> (kexec)
> CC: Pasha Tatashin <pasha.tatashin@soleen.com> (kexec)

Please, move the Cc: list to the...

> Signed-off-by: Yury Norov <ynorov@nvidia.com>
> ---

...comments block. It will have the same effect on emails, but drastically
reduces unneeded noise in the commit message in the Git history.

You may also read this subthread (patch 18) on how to handle it locally:
https://lore.kernel.org/linux-iio/20260123113708.416727-19-bigeasy@linutronix.de/

>  include/linux/count_zeros.h | 9 +++------

...

> +#define COUNT_TRAILING_ZEROS_0 (-1)

Shouldn't we also saturate this to BITS_PER_LONG?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* [PATCH] MAINTAINERS: Update maintainers for Hyper-V DRM driver
From: Saurabh Sengar @ 2026-03-13  4:21 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv, wei.liu
  Cc: decui, longli, drawat.floss, ssengar, Saurabh Sengar

Add myself, Dexuana, and Long as maintainers. Deepak is stepping down
from these responsibilities.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6358dd7f1632..d67afcb0acc3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8028,7 +8028,9 @@ F:	Documentation/devicetree/bindings/display/himax,hx8357.yaml
 F:	drivers/gpu/drm/tiny/hx8357d.c
 
 DRM DRIVER FOR HYPERV SYNTHETIC VIDEO DEVICE
-M:	Deepak Rawat <drawat.floss@gmail.com>
+M:	Dexuan Cui <decui@microsoft.com>
+M:	Long Li <longli@microsoft.com>
+M:	Saurabh Sengar <ssengar@linux.microsoft.com>
 L:	linux-hyperv@vger.kernel.org
 L:	dri-devel@lists.freedesktop.org
 S:	Maintained
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 02/15] mm: add documentation for the mmap_prepare file operation callback
From: Randy Dunlap @ 2026-03-13  0:12 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Andrew Morton
  Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
	Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
	Alexandre Torgue, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
	David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
	Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <c5bb61cf789df1ecb32facc29df9749987c7ddfc.1773346620.git.ljs@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 6544 bytes --]

(Andrew: patch attached)


On 3/12/26 1:27 PM, Lorenzo Stoakes (Oracle) wrote:

Documentation/filesystems/mmap_prepare.rst: WARNING: document isn't included in any toctree [toc.not_included]

Should be in some index.rst file. In filesystems I suppose.

> ---
>  Documentation/filesystems/mmap_prepare.rst | 131 +++++++++++++++++++++
>  1 file changed, 131 insertions(+)
>  create mode 100644 Documentation/filesystems/mmap_prepare.rst
> 
> diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
> new file mode 100644
> index 000000000000..76908200f3a1
> --- /dev/null
> +++ b/Documentation/filesystems/mmap_prepare.rst
> @@ -0,0 +1,131 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========================
> +mmap_prepare callback HOWTO
> +===========================
> +
> +Introduction
> +############

Kernel style is "=============" above instead of "############".

> +
> +The `struct file->f_op->mmap()` callback has been deprecated as it is both a
> +stability and security risk, and doesn't always permit the merging of adjacent
> +mappings resulting in unnecessary memory fragmentation.
> +
> +It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
> +these problems.
> +
> +## How To Use
> +
> +In your driver's `struct file_operations` struct, specify an `mmap_prepare`
> +callback rather than an `mmap` one, e.g. for ext4:
> +
> +
> +.. code-block:: C
> +
> +    const struct file_operations ext4_file_operations = {
> +        ...
> +        .mmap_prepare    = ext4_file_mmap_prepare,
> +    };
> +
> +This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
> +
> +Examining the `struct vm_area_desc` type:
> +
> +.. code-block:: C
> +
> +    struct vm_area_desc {
> +        /* Immutable state. */
> +        const struct mm_struct *const mm;
> +        struct file *const file; /* May vary from vm_file in stacked callers. */
> +        unsigned long start;
> +        unsigned long end;
> +
> +        /* Mutable fields. Populated with initial state. */
> +        pgoff_t pgoff;
> +        struct file *vm_file;
> +        vma_flags_t vma_flags;
> +        pgprot_t page_prot;
> +
> +        /* Write-only fields. */
> +        const struct vm_operations_struct *vm_ops;
> +        void *private_data;
> +
> +        /* Take further action? */
> +        struct mmap_action action;
> +    };
> +
> +This is straightforward - you have all the fields you need to set up the
> +mapping, and you can update the mutable and writable fields, for instance:
> +
> +.. code-block:: Cw

   .. code-block:: C

Documentation/filesystems/mmap_prepare.rst:60: WARNING: Pygments lexer name 'Cw' is not known [misc.highlighting_failure]

Maybe a typo?

> +
> +    static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
> +    {
> +        int ret;
> +        struct file *file = desc->file;
> +        struct inode *inode = file->f_mapping->host;
> +
> +        ...
> +
> +        file_accessed(file);
> +        if (IS_DAX(file_inode(file))) {
> +            desc->vm_ops = &ext4_dax_vm_ops;
> +            vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
> +        } else {
> +            desc->vm_ops = &ext4_file_vm_ops;
> +        }
> +        return 0;
> +    }
> +
> +Importantly, you no longer have to dance around with reference counts or locks
> +when updating these fields - __you can simply go ahead and change them__.
> +
> +Everything is taken care of by the mapping code.
> +
> +VMA Flags
> +=========

and then use "---------------" here instead of "==============".

(from Documentation/doc-guide/sphinx.rst)

> +
> +Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
> +you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
> +`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
> +locking done correctly for you, this is no longer necessary.
> +
> +Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
> +etc. - i.e. using a `VM_xxx` macro has changed too.
> +
> +When implementing `mmap_prepare()`, reference flags by their bit number, defined
> +as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
> +of (where `desc` is a pointer to `struct vma_area_desc`):
> +
> +* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
> +  wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
> +  VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
> +  otherwise `false`.
> +* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
> +  additional flags specified by a comma-separated list,
> +  e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
> +* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
> +  flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
> +  VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
> +
> +Actions
> +=======
> +
> +You can now very easily have actions be performed upon a mapping once set up by
> +utilising simple helper functions invoked upon the `struct vm_area_desc`
> +pointer. These are:
> +
> +* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
> +  range starting a virtual address and PFN number of a set size.
> +
> +* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
> +  entire mapping from `start_pfn` onward.
> +
> +* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
> +  remap.
> +
> +* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
> +  the entire mapping from `start_pfn` onward.
> +
> +**NOTE:** The 'action' field should never normally be manipulated directly,
> +rather you ought to use one of these helpers.

I also see this warning, but I don't know what it is referring to:

Documentation/filesystems/mmap_prepare.rst:132: ERROR: Anonymous hyperlink mismatch: 1 references but 0 targets.
See "backrefs" attribute for IDs. [docutils]

(OK, I found/fixed that also.)

There are also lots of single ` marks which mean italics. I thought those were
not what was intended, so I changed (most of) them to `` marks, which means
"code block / monospace". I can fix those if needed.

from the patch file:
@Lorenzo: ISTR that you prefer explicit quoting on structs and
functions. I didn't do that here since kernel automarkup does that,
but if you prefer, I can redo the patch with those changes.

HTH.
-- 
~Randy

[-- Attachment #2: mmap-prepare-docs-fixes.patch --]
[-- Type: text/x-patch, Size: 7252 bytes --]

From: Randy Dunlap <rdunlap@infradead.org>
Subject: [PATCH] Docs: mmap_prepare: fix sphinx warnings and format

Fix 'make htmldocs' build warnings, headings style, and quoting
style.

Documentation/filesystems/mmap_prepare.rst: WARNING: document isn't included in any toctree [toc.not_included]
Documentation/filesystems/mmap_prepare.rst:60: WARNING: Pygments lexer name 'Cw' is not known [misc.highlighting_failure]
Documentation/filesystems/mmap_prepare.rst:132: ERROR: Anonymous hyperlink mismatch: 1 references but 0 targets.
See "backrefs" attribute for IDs. [docutils]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
@Lorenzo: ISTR that you prefer explicit quoting on structs and
functions. I didn't do that here since kernel automarkup does that,
but if you prefer, I can redo the patch with those changes.

 Documentation/filesystems/index.rst        |    1 
 Documentation/filesystems/mmap_prepare.rst |   74 +++++++++----------
 2 files changed, 38 insertions(+), 37 deletions(-)

--- linux-next.orig/Documentation/filesystems/index.rst
+++ linux-next/Documentation/filesystems/index.rst
@@ -29,6 +29,7 @@ algorithms work.
    fiemap
    files
    locks
+   mmap_prepare
    multigrain-ts
    mount_api
    quota
--- linux-next.orig/Documentation/filesystems/mmap_prepare.rst
+++ linux-next/Documentation/filesystems/mmap_prepare.rst
@@ -5,19 +5,19 @@ mmap_prepare callback HOWTO
 ===========================
 
 Introduction
-############
+============
 
-The `struct file->f_op->mmap()` callback has been deprecated as it is both a
+The ``struct file->f_op->mmap()`` callback has been deprecated as it is both a
 stability and security risk, and doesn't always permit the merging of adjacent
 mappings resulting in unnecessary memory fragmentation.
 
-It has been replaced with the `file->f_op->mmap_prepare()` callback which solves
-these problems.
+It has been replaced with the ``file->f_op->mmap_prepare()`` callback which
+solves these problems.
 
 ## How To Use
 
-In your driver's `struct file_operations` struct, specify an `mmap_prepare`
-callback rather than an `mmap` one, e.g. for ext4:
+In your driver's struct file_operations struct, specify an ``mmap_prepare``
+callback rather than an ``mmap`` one, e.g. for ext4:
 
 
 .. code-block:: C
@@ -27,9 +27,9 @@ callback rather than an `mmap` one, e.g.
         .mmap_prepare    = ext4_file_mmap_prepare,
     };
 
-This has a signature of `int (*mmap_prepare)(struct vm_area_desc *)`.
+This has a signature of ``int (*mmap_prepare)(struct vm_area_desc *)``.
 
-Examining the `struct vm_area_desc` type:
+Examining the struct vm_area_desc type:
 
 .. code-block:: C
 
@@ -57,7 +57,7 @@ Examining the `struct vm_area_desc` type
 This is straightforward - you have all the fields you need to set up the
 mapping, and you can update the mutable and writable fields, for instance:
 
-.. code-block:: Cw
+.. code-block:: C
 
     static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
     {
@@ -78,54 +78,54 @@ mapping, and you can update the mutable
     }
 
 Importantly, you no longer have to dance around with reference counts or locks
-when updating these fields - __you can simply go ahead and change them__.
+when updating these fields - **you can simply go ahead and change them**.
 
 Everything is taken care of by the mapping code.
 
 VMA Flags
-=========
+---------
 
-Along with `mmap_prepare`, VMA flags have undergone an overhaul. Where before
-you would invoke one of `vm_flags_init()`, `vm_flags_reset()`, `vm_flags_set()`,
-`vm_flags_clear()`, and `vm_flags_mod()` to modify flags (and to have the
+Along with ``mmap_prepare``, VMA flags have undergone an overhaul. Where before
+you would invoke one of vm_flags_init(), vm_flags_reset(), vm_flags_set(),
+vm_flags_clear(), and vm_flags_mod() to modify flags (and to have the
 locking done correctly for you, this is no longer necessary.
 
-Also, the legacy approach of specifying VMA flags via `VM_READ`, `VM_WRITE`,
-etc. - i.e. using a `VM_xxx` macro has changed too.
+Also, the legacy approach of specifying VMA flags via ``VM_READ``, ``VM_WRITE``,
+etc. - i.e. using a ``-VM_xxx``- macro has changed too.
 
-When implementing `mmap_prepare()`, reference flags by their bit number, defined
-as a `VMA_xxx_BIT` macro, e.g. `VMA_READ_BIT`, `VMA_WRITE_BIT` etc., and use one
-of (where `desc` is a pointer to `struct vma_area_desc`):
-
-* `vma_desc_test_flags(desc, ...)` - Specify a comma-separated list of flags you
-  wish to test for (whether _any_ are set), e.g. - `vma_desc_test_flags(desc,
-  VMA_WRITE_BIT, VMA_MAYWRITE_BIT)` - returns `true` if either are set,
-  otherwise `false`.
-* `vma_desc_set_flags(desc, ...)` - Update the VMA descriptor flags to set
+When implementing mmap_prepare(), reference flags by their bit number, defined
+as a ``VMA_xxx_BIT`` macro, e.g. ``VMA_READ_BIT``, ``VMA_WRITE_BIT`` etc.,
+and use one of (where ``desc`` is a pointer to struct vma_area_desc):
+
+* ``vma_desc_test_flags(desc, ...)`` - Specify a comma-separated list of flags
+  you wish to test for (whether _any_ are set), e.g. - ``vma_desc_test_flags(
+  desc, VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`` - returns ``true`` if either are set,
+  otherwise ``false``.
+* ``vma_desc_set_flags(desc, ...)`` - Update the VMA descriptor flags to set
   additional flags specified by a comma-separated list,
-  e.g. - `vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)`.
-* `vma_desc_clear_flags(desc, ...)` - Update the VMA descriptor flags to clear
-  flags specified by a comma-separated list, e.g. - `vma_desc_clear_flags(desc,
-  VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`.
+  e.g. - ``vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)``.
+* ``vma_desc_clear_flags(desc, ...)`` - Update the VMA descriptor flags to clear
+  flags specified by a comma-separated list, e.g. - ``vma_desc_clear_flags(
+  desc, VMA_WRITE_BIT, VMA_MAYWRITE_BIT)``.
 
 Actions
 =======
 
 You can now very easily have actions be performed upon a mapping once set up by
-utilising simple helper functions invoked upon the `struct vm_area_desc`
+utilising simple helper functions invoked upon the struct vm_area_desc
 pointer. These are:
 
-* `mmap_action_remap()` - Remaps a range consisting only of PFNs for a specific
+* mmap_action_remap() - Remaps a range consisting only of PFNs for a specific
   range starting a virtual address and PFN number of a set size.
 
-* `mmap_action_remap_full()` - Same as `mmap_action_remap()`, only remaps the
-  entire mapping from `start_pfn` onward.
+* mmap_action_remap_full() - Same as mmap_action_remap(), only remaps the
+  entire mapping from ``start_pfn`` onward.
 
-* `mmap_action_ioremap()` - Same as `mmap_action_remap()`, only performs an I/O
+* mmap_action_ioremap() - Same as mmap_action_remap(), only performs an I/O
   remap.
 
-* `mmap_action_ioremap_full()` - Same as `mmap_action_ioremap()`, only remaps
-  the entire mapping from `start_pfn` onward.
+* mmap_action_ioremap_full() - Same as mmap_action_ioremap(), only remaps
+  the entire mapping from ``start_pfn`` onward.
 
-**NOTE:** The 'action' field should never normally be manipulated directly,
+**NOTE:** The ``action`` field should never normally be manipulated directly,
 rather you ought to use one of these helpers.

^ permalink raw reply

* Re: [PATCH] lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
From: Enzo Matsumiya @ 2026-03-12 23:54 UTC (permalink / raw)
  To: Yury Norov
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes, Eric Biggers,
	Jason A. Donenfeld, Ard Biesheuvel, linux-kernel, kexec,
	linux-cifs, linux-spi, linux-hyperv, K. Y. Srinivasan,
	Haiyang Zhang, Jason Gunthorpe, Leon Romanovsky, Mark Brown,
	Steve French, Alexander Graf, Mike Rapoport, Pasha Tatashin
In-Reply-To: <20260312230817.372878-1-ynorov@nvidia.com>

Hi Yury,

On 03/12, Yury Norov wrote:
>Based on 'sizeof(x) == 4' condition, in 32-bit case the function is wired
>to ffs(), while in 64-bit case to __ffs(). The difference is substantial:
>ffs(x) == __ffs(x) + 1. Also, ffs(0) == 0, while __ffs(0) is undefined.
>
>The 32-bit behaviour is inconsistent with the function description, so it
>needs to get fixed.
>
>There are 9 individual users for the function in 6 different subsystems.
>Some arches and drivers are 64-bit only:
> - arch/loongarch/kvm/intc/eiointc.c;
> - drivers/hv/mshv_vtl_main.c;
> - kernel/liveupdate/kexec_handover.c;
>
>The others are:
> - ib_umem_find_best_pgsz(): as per comment, __ffs() should be correct;
> - rzv2m_csi_reg_write_bit(): ARCH_RENESAS only, unclear;
> - lz77_match_len(): CIFS_COMPRESSION only, unclear, experimental;
>
>None of them explicitly tweak their code for a word length, or x == 0.

Context for lz77_match_len() case:

	const u64 diff = lz77_read64(cur) ^ lz77_read64(wnd);

	if (!diff) {
	...
	}

	cur += count_trailing_zeros(diff) >> 3;

So x == 0 is checked, however it does assume that
sizeof(unsigned long) == sizeof(u64).  I'll have to fix it for when
that's not the case (even with your patch in, as __ffs() casts x to
unsigned long down the line).  Thanks for the heads up.


Cheers,

Enzo

>Requesting comments from the corresponding maintainers on how to proceed
>with this.
>
>The attached patch gets rid of 32-bit explicit support, so that both
>32- and 64-bit versions rely on __ffs().
>
>CC: "K. Y. Srinivasan" <kys@microsoft.com> (hyperv)
>CC: Haiyang Zhang <haiyangz@microsoft.com> (hyperv)
>CC: Jason Gunthorpe <jgg@ziepe.ca> (infiniband)
>CC: Leon Romanovsky <leon@kernel.org> (infiniband)
>CC: Mark Brown <broonie@kernel.org> (spi)
>CC: Steve French <sfrench@samba.org> (smb)
>CC: Alexander Graf <graf@amazon.com> (kexec)
>CC: Mike Rapoport <rppt@kernel.org> (kexec)
>CC: Pasha Tatashin <pasha.tatashin@soleen.com> (kexec)
>Signed-off-by: Yury Norov <ynorov@nvidia.com>
>---
> include/linux/count_zeros.h | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
>diff --git a/include/linux/count_zeros.h b/include/linux/count_zeros.h
>index 4e5680327ece..5034a30b5c7c 100644
>--- a/include/linux/count_zeros.h
>+++ b/include/linux/count_zeros.h
>@@ -10,6 +10,8 @@
>
> #include <asm/bitops.h>
>
>+#define COUNT_TRAILING_ZEROS_0 (-1)
>+
> /**
>  * count_leading_zeros - Count the number of zeros from the MSB back
>  * @x: The value
>@@ -40,12 +42,7 @@ static inline int count_leading_zeros(unsigned long x)
>  */
> static inline int count_trailing_zeros(unsigned long x)
> {
>-#define COUNT_TRAILING_ZEROS_0 (-1)
>-
>-	if (sizeof(x) == 4)
>-		return ffs(x);
>-	else
>-		return (x != 0) ? __ffs(x) : COUNT_TRAILING_ZEROS_0;
>+	return (x != 0) ? __ffs(x) : COUNT_TRAILING_ZEROS_0;
> }
>
> #endif /* _LINUX_BITOPS_COUNT_ZEROS_H_ */
>-- 
>2.43.0
>
>

^ permalink raw reply

* [PATCH] lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
From: Yury Norov @ 2026-03-12 23:08 UTC (permalink / raw)
  To: Yury Norov, Andy Shevchenko, Rasmus Villemoes, Eric Biggers,
	Jason A. Donenfeld, Ard Biesheuvel
  Cc: Yury Norov, linux-kernel, kexec, linux-cifs, linux-spi,
	linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Jason Gunthorpe,
	Leon Romanovsky, Mark Brown, Steve French, Alexander Graf,
	Mike Rapoport, Pasha Tatashin

Based on 'sizeof(x) == 4' condition, in 32-bit case the function is wired
to ffs(), while in 64-bit case to __ffs(). The difference is substantial:
ffs(x) == __ffs(x) + 1. Also, ffs(0) == 0, while __ffs(0) is undefined.

The 32-bit behaviour is inconsistent with the function description, so it
needs to get fixed.

There are 9 individual users for the function in 6 different subsystems.
Some arches and drivers are 64-bit only:
 - arch/loongarch/kvm/intc/eiointc.c;
 - drivers/hv/mshv_vtl_main.c;
 - kernel/liveupdate/kexec_handover.c;

The others are:
 - ib_umem_find_best_pgsz(): as per comment, __ffs() should be correct;
 - rzv2m_csi_reg_write_bit(): ARCH_RENESAS only, unclear;
 - lz77_match_len(): CIFS_COMPRESSION only, unclear, experimental;

None of them explicitly tweak their code for a word length, or x == 0.

Requesting comments from the corresponding maintainers on how to proceed
with this.

The attached patch gets rid of 32-bit explicit support, so that both
32- and 64-bit versions rely on __ffs().

CC: "K. Y. Srinivasan" <kys@microsoft.com> (hyperv)
CC: Haiyang Zhang <haiyangz@microsoft.com> (hyperv)
CC: Jason Gunthorpe <jgg@ziepe.ca> (infiniband)
CC: Leon Romanovsky <leon@kernel.org> (infiniband)
CC: Mark Brown <broonie@kernel.org> (spi)
CC: Steve French <sfrench@samba.org> (smb)
CC: Alexander Graf <graf@amazon.com> (kexec)
CC: Mike Rapoport <rppt@kernel.org> (kexec)
CC: Pasha Tatashin <pasha.tatashin@soleen.com> (kexec)
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
 include/linux/count_zeros.h | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/include/linux/count_zeros.h b/include/linux/count_zeros.h
index 4e5680327ece..5034a30b5c7c 100644
--- a/include/linux/count_zeros.h
+++ b/include/linux/count_zeros.h
@@ -10,6 +10,8 @@
 
 #include <asm/bitops.h>
 
+#define COUNT_TRAILING_ZEROS_0 (-1)
+
 /**
  * count_leading_zeros - Count the number of zeros from the MSB back
  * @x: The value
@@ -40,12 +42,7 @@ static inline int count_leading_zeros(unsigned long x)
  */
 static inline int count_trailing_zeros(unsigned long x)
 {
-#define COUNT_TRAILING_ZEROS_0 (-1)
-
-	if (sizeof(x) == 4)
-		return ffs(x);
-	else
-		return (x != 0) ? __ffs(x) : COUNT_TRAILING_ZEROS_0;
+	return (x != 0) ? __ffs(x) : COUNT_TRAILING_ZEROS_0;
 }
 
 #endif /* _LINUX_BITOPS_COUNT_ZEROS_H_ */
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]()
From: Randy Dunlap @ 2026-03-12 23:15 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Andrew Morton
  Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
	Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
	Alexandre Torgue, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
	David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
	Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <21d8899bb1f4db61203072fb3a56a6c98a61e23d.1773346620.git.ljs@kernel.org>


On 3/12/26 1:27 PM, Lorenzo Stoakes (Oracle) wrote:

> Finally, we update the VMA tests accordingly to reflect the changes.

IMO we could omit the word "we" 5 times above.
(but no change is required)

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 88f42faeb377..88ad5649c02d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h

> +/**
> + * range_is_subset - Is the specified inner range a subset of the outer range?
> + * @outer_start: The start of the outer range.
> + * @outer_end: The exclusive end of the outer range.
> + * @inner_start: The start of the inner range.
> + * @inner_end: The exclusive end of the inner range.
> + *
> + * Returns %true if [inner_start, inner_end) is a subset of [outer_start,

    * Returns:
(for kernel-doc)

> + * outer_end), otherwise %false.
> + */
> +static inline bool range_is_subset(unsigned long outer_start,
> +				   unsigned long outer_end,
> +				   unsigned long inner_start,
> +				   unsigned long inner_end)
> +{
> +	return outer_start <= inner_start && inner_end <= outer_end;
> +}
> +
> +/**
> + * range_in_vma - is the specified [@start, @end) range a subset of the VMA?
> + * @vma: The VMA against which we want to check [@start, @end).
> + * @start: The start of the range we wish to check.
> + * @end: The exclusive end of the range we wish to check.
> + *
> + * Returns %true if [@start, @end) is a subset of [@vma->vm_start,

    * Returns:

> + * @vma->vm_end), %false otherwise.
> + */
>  static inline bool range_in_vma(const struct vm_area_struct *vma,
>  				unsigned long start, unsigned long end)
>  {
> -	return (vma && vma->vm_start <= start && end <= vma->vm_end);
> +	if (!vma)
> +		return false;
> +
> +	return range_is_subset(vma->vm_start, vma->vm_end, start, end);
> +}
> +
> +/**
> + * range_in_vma_desc - is the specified [@start, @end) range a subset of the VMA
> + * described by @desc, a VMA descriptor?
> + * @desc: The VMA descriptor against which we want to check [@start, @end).
> + * @start: The start of the range we wish to check.
> + * @end: The exclusive end of the range we wish to check.
> + *
> + * Returns %true if [@start, @end) is a subset of [@desc->start, @desc->end),

    * Returns:

> + * %false otherwise.
> + */
> +static inline bool range_in_vma_desc(const struct vm_area_desc *desc,
> +				     unsigned long start, unsigned long end)
> +{
> +	if (!desc)
> +		return false;
> +
> +	return range_is_subset(desc->start, desc->end, start, end);
>  }

-- 
~Randy


^ permalink raw reply

* Re: [PATCH rdma-next v2] RDMA/mana_ib: hardening: Clamp adapter capability values from MANA_IB_GET_ADAPTER_CAP
From: Jason Gunthorpe @ 2026-03-12 22:48 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: longli, kotaranov, Leon Romanovsky, linux-rdma, linux-hyperv,
	linux-kernel
In-Reply-To: <20260312181642.989735-1-ernis@linux.microsoft.com>

On Thu, Mar 12, 2026 at 11:16:41AM -0700, Erni Sri Satya Vennela wrote:
> The response fields (max_qp_count, max_cq_count, max_mr_count,
> max_pd_count, max_inbound_read_limit, max_outbound_read_limit,
> max_qp_wr, max_send_sge_count, max_recv_sge_count) are u32 but are
> assigned to signed int members in struct ib_device_attr.

There is no reason they should be signed, you should just fix the
type.

I'm also not convinced clamping to such a high value has any value
whatsoever, as it probably still triggers maths overflows elsewhere. I
think you should clamp to reasonable limits for your device if you
want to do this.

Jason

^ permalink raw reply

* [PATCH] PCI: hv: Set default NUMA node to 0 for devices without affinity info
From: Long Li @ 2026-03-12 22:32 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Lorenzo Pieralisi, Krzysztof Wilczyński,
	Manivannan Sadhasivam, Bjorn Helgaas
  Cc: Long Li, Rob Herring, Michael Kelley, linux-hyperv, linux-pci,
	linux-kernel

When a Hyper-V PCI device does not have
HV_PCI_DEVICE_FLAG_NUMA_AFFINITY set or has an out-of-range
virtual_numa_node, hv_pci_assign_numa_node() leaves the device
NUMA node unset. On x86_64, the default NUMA node happens to be
0, but on ARM64 it is NUMA_NO_NODE (-1), leading to inconsistent
behavior across architectures.

In Azure, when no NUMA information is available from the host,
devices perform best when assigned to node 0. Set the device NUMA
node to 0 unconditionally before the conditional NUMA affinity
check, so that devices always get a valid default and behavior is
consistent on both x86_64 and ARM64.

Fixes: 999dd956d838 ("PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2")
Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 2c7a406b4ba8..5c03b6e4cdab 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2485,6 +2485,9 @@ static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus)
 		if (!hv_dev)
 			continue;

+		/* Default to node 0 for consistent behavior across architectures */
+		set_dev_node(&dev->dev, 0);
+
 		if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY &&
 		    hv_dev->desc.virtual_numa_node < num_possible_nodes())
 			/*
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH 00/15] mm: expand mmap_prepare functionality and usage
From: Andrew Morton @ 2026-03-12 21:23 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
	Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
	Alexandre Torgue, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
	David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
	Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <cover.1773346620.git.ljs@kernel.org>

On Thu, 12 Mar 2026 20:27:15 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> This series expands the mmap_prepare functionality, which is intended to
> replace the deprecated f_op->mmap hook which has been the source of bugs
> and security issues for some time.

Thanks, I've added this to mm.git's mm-new branch.

^ permalink raw reply

* Re: [PATCH 01/15] mm: various small mmap_prepare cleanups
From: Andrew Morton @ 2026-03-12 21:14 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
	Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
	Alexandre Torgue, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
	David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
	Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <56372fe273f775b26675a04652c1229e14680741.1773346620.git.ljs@kernel.org>

On Thu, 12 Mar 2026 20:27:16 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> +int mmap_action_prepare(struct vm_area_desc *desc,
> +			struct mmap_action *action)
> +
>  {
>  	switch (action->type) {
>  	case MMAP_NOTHING:
> -		break;
> +		return 0;
>  	case MMAP_REMAP_PFN:
> -		remap_pfn_range_prepare(desc, action->remap.start_pfn);
> -		break;
> +		return remap_pfn_range_prepare(desc, action);
>  	case MMAP_IO_REMAP_PFN:
> -		io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
> -					   action->remap.size);
> -		break;
> +		return io_remap_pfn_range_prepare(desc, action);
>  	}
>  }
>  EXPORT_SYMBOL(mmap_action_prepare);

hm, was this the correct version?

mm/util.c: In function 'mmap_action_prepare':
mm/util.c:1451:1: error: control reaches end of non-void function [-Werror=return-type]
 1451 | }

--- a/mm/util.c~mm-various-small-mmap_prepare-cleanups-fix
+++ a/mm/util.c
@@ -1356,6 +1356,8 @@ int mmap_action_prepare(struct vm_area_d
 		return remap_pfn_range_prepare(desc, action);
 	case MMAP_IO_REMAP_PFN:
 		return io_remap_pfn_range_prepare(desc, action);
+	default:
+		BUG();
 	}
 }
 EXPORT_SYMBOL(mmap_action_prepare);
_


^ permalink raw reply

* [PATCH 15/15] mm: add mmap_action_map_kernel_pages[_full]()
From: Lorenzo Stoakes (Oracle) @ 2026-03-12 20:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, Clemens Ladisch, Arnd Bergmann,
	Greg Kroah-Hartman, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Alexander Shishkin, Maxime Coquelin,
	Alexandre Torgue, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Bodo Stroesser, Martin K . Petersen,
	David Howells, Marc Dionne, Alexander Viro, Christian Brauner,
	Jan Kara, David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Pedro Falcato, linux-kernel, linux-doc, linux-hyperv, linux-stm32,
	linux-arm-kernel, linux-mtd, linux-staging, linux-scsi,
	target-devel, linux-afs, linux-fsdevel, linux-mm, Ryan Roberts
In-Reply-To: <cover.1773346620.git.ljs@kernel.org>

A user can invoke mmap_action_map_kernel_pages() to specify that the
mapping should map kernel pages starting from desc->start of a specified
number of pages specified in an array.

In order to implement this, adjust mmap_action_prepare() to be able to
return an error code, as it makes sense to assert that the specified
parameters are valid as quickly as possible as well as updating the VMA
flags to include VMA_MIXEDMAP_BIT as necessary.

This provides an mmap_prepare equivalent of vm_insert_pages().

We additionally update the existing vm_insert_pages() code to use
range_in_vma() and add a new range_in_vma_desc() helper function for the
mmap_prepare case, sharing the code between the two in range_is_subset().

We add both mmap_action_map_kernel_pages() and
mmap_action_map_kernel_pages_full() to allow for both partial and full VMA
mappings.

We also add mmap_action_map_kernel_pages_discontig() to allow for
discontiguous mapping of kernel pages should the need arise.

We update the documentation to reflect the new features.

Finally, we update the VMA tests accordingly to reflect the changes.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 Documentation/filesystems/mmap_prepare.rst |  8 ++
 include/linux/mm.h                         | 94 +++++++++++++++++++++-
 include/linux/mm_types.h                   |  7 ++
 mm/memory.c                                | 42 +++++++++-
 mm/util.c                                  |  6 ++
 tools/testing/vma/include/dup.h            |  7 ++
 6 files changed, 159 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/mmap_prepare.rst b/Documentation/filesystems/mmap_prepare.rst
index d21406848bca..f89718285869 100644
--- a/Documentation/filesystems/mmap_prepare.rst
+++ b/Documentation/filesystems/mmap_prepare.rst
@@ -129,5 +129,13 @@ pointer. These are:
 * `mmap_action_simple_ioremap()` - Sets up an I/O remap from a specified
   physical address and over a specified length.
 
+* `mmap_action_map_kernel_pages()` - Maps a specified array of `struct page`
+  pointers in the VMA from a specific offset.
+
+* `mmap_action_map_kernel_pages_full()` - Maps a specified array of `struct
+  page` pointers over the entire VMA. The caller must ensure there are
+  sufficient entries in the page array to cover the entire range of the
+  described VMA.
+
 **NOTE:** The 'action' field should never normally be manipulated directly,
 rather you ought to use one of these helpers.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 88f42faeb377..88ad5649c02d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4160,6 +4160,45 @@ static inline void mmap_action_simple_ioremap(struct vm_area_desc *desc,
 	action->type = MMAP_SIMPLE_IO_REMAP;
 }
 
+/**
+ * mmap_action_map_kernel_pages - helper for mmap_prepare hook to specify that
+ * @num kernel pages contained in the @pages array should be mapped to userland
+ * starting at virtual address @start.
+ * @desc: The VMA descriptor for the VMA requiring kernel pags to be mapped.
+ * @start: The virtual address from which to map them.
+ * @pages: An array of struct page pointers describing the memory to map.
+ * @nr_pages: The number of entries in the @pages aray.
+ */
+static inline void mmap_action_map_kernel_pages(struct vm_area_desc *desc,
+		unsigned long start, struct page **pages,
+		unsigned long nr_pages)
+{
+	struct mmap_action *action = &desc->action;
+
+	action->type = MMAP_MAP_KERNEL_PAGES;
+	action->map_kernel.start = start;
+	action->map_kernel.pages = pages;
+	action->map_kernel.nr_pages = nr_pages;
+	action->map_kernel.pgoff = desc->pgoff;
+}
+
+/**
+ * mmap_action_map_kernel_pages_full - helper for mmap_prepare hook to specify that
+ * kernel pages contained in the @pages array should be mapped to userland
+ * from @desc->start to @desc->end.
+ * @desc: The VMA descriptor for the VMA requiring kernel pags to be mapped.
+ * @pages: An array of struct page pointers describing the memory to map.
+ *
+ * The caller must ensure that @pages contains sufficient entries to cover the
+ * entire range described by @desc.
+ */
+static inline void mmap_action_map_kernel_pages_full(struct vm_area_desc *desc,
+		struct page **pages)
+{
+	mmap_action_map_kernel_pages(desc, desc->start, pages,
+				     vma_desc_pages(desc));
+}
+
 int mmap_action_prepare(struct vm_area_desc *desc,
 			struct mmap_action *action);
 int mmap_action_complete(struct vm_area_struct *vma,
@@ -4177,10 +4216,59 @@ static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
 	return vma;
 }
 
+/**
+ * range_is_subset - Is the specified inner range a subset of the outer range?
+ * @outer_start: The start of the outer range.
+ * @outer_end: The exclusive end of the outer range.
+ * @inner_start: The start of the inner range.
+ * @inner_end: The exclusive end of the inner range.
+ *
+ * Returns %true if [inner_start, inner_end) is a subset of [outer_start,
+ * outer_end), otherwise %false.
+ */
+static inline bool range_is_subset(unsigned long outer_start,
+				   unsigned long outer_end,
+				   unsigned long inner_start,
+				   unsigned long inner_end)
+{
+	return outer_start <= inner_start && inner_end <= outer_end;
+}
+
+/**
+ * range_in_vma - is the specified [@start, @end) range a subset of the VMA?
+ * @vma: The VMA against which we want to check [@start, @end).
+ * @start: The start of the range we wish to check.
+ * @end: The exclusive end of the range we wish to check.
+ *
+ * Returns %true if [@start, @end) is a subset of [@vma->vm_start,
+ * @vma->vm_end), %false otherwise.
+ */
 static inline bool range_in_vma(const struct vm_area_struct *vma,
 				unsigned long start, unsigned long end)
 {
-	return (vma && vma->vm_start <= start && end <= vma->vm_end);
+	if (!vma)
+		return false;
+
+	return range_is_subset(vma->vm_start, vma->vm_end, start, end);
+}
+
+/**
+ * range_in_vma_desc - is the specified [@start, @end) range a subset of the VMA
+ * described by @desc, a VMA descriptor?
+ * @desc: The VMA descriptor against which we want to check [@start, @end).
+ * @start: The start of the range we wish to check.
+ * @end: The exclusive end of the range we wish to check.
+ *
+ * Returns %true if [@start, @end) is a subset of [@desc->start, @desc->end),
+ * %false otherwise.
+ */
+static inline bool range_in_vma_desc(const struct vm_area_desc *desc,
+				     unsigned long start, unsigned long end)
+{
+	if (!desc)
+		return false;
+
+	return range_is_subset(desc->start, desc->end, start, end);
 }
 
 #ifdef CONFIG_MMU
@@ -4212,6 +4300,10 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
 int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
 			struct page **pages, unsigned long *num);
+int map_kernel_pages_prepare(struct vm_area_desc *desc,
+			     struct mmap_action *action);
+int map_kernel_pages_complete(struct vm_area_struct *vma,
+			      struct mmap_action *action);
 int vm_map_pages(struct vm_area_struct *vma, struct page **pages,
 				unsigned long num);
 int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 316bb0adf91d..6e7a399f0724 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -815,6 +815,7 @@ enum mmap_action_type {
 	MMAP_REMAP_PFN,		/* Remap PFN range. */
 	MMAP_IO_REMAP_PFN,	/* I/O remap PFN range. */
 	MMAP_SIMPLE_IO_REMAP,	/* I/O remap with guardrails. */
+	MMAP_MAP_KERNEL_PAGES,	/* Map kernel page range from array. */
 };
 
 /*
@@ -833,6 +834,12 @@ struct mmap_action {
 			phys_addr_t start_phys_addr;
 			unsigned long size;
 		} simple_ioremap;
+		struct {
+			unsigned long start;
+			struct page **pages;
+			unsigned long nr_pages;
+			pgoff_t pgoff;
+		} map_kernel;
 	};
 	enum mmap_action_type type;
 
diff --git a/mm/memory.c b/mm/memory.c
index 351cc917b7aa..608a98c4c947 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2484,13 +2484,14 @@ static int insert_pages(struct vm_area_struct *vma, unsigned long addr,
 int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
 			struct page **pages, unsigned long *num)
 {
-	const unsigned long end_addr = addr + (*num * PAGE_SIZE) - 1;
+	const unsigned long nr_pages = *num;
+	const unsigned long end = addr + PAGE_SIZE * nr_pages;
 
-	if (addr < vma->vm_start || end_addr >= vma->vm_end)
+	if (!range_in_vma(vma, addr, end))
 		return -EFAULT;
 	if (!(vma->vm_flags & VM_MIXEDMAP)) {
-		BUG_ON(mmap_read_trylock(vma->vm_mm));
-		BUG_ON(vma->vm_flags & VM_PFNMAP);
+		VM_WARN_ON_ONCE(mmap_read_trylock(vma->vm_mm));
+		VM_WARN_ON_ONCE(vma->vm_flags & VM_PFNMAP);
 		vm_flags_set(vma, VM_MIXEDMAP);
 	}
 	/* Defer page refcount checking till we're about to map that page. */
@@ -2498,6 +2499,39 @@ int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_insert_pages);
 
+int map_kernel_pages_prepare(struct vm_area_desc *desc,
+			     struct mmap_action *action)
+{
+	const unsigned long addr = action->map_kernel.start;
+	unsigned long nr_pages, end;
+
+	if (!vma_desc_test(desc, VMA_MIXEDMAP_BIT)) {
+		VM_WARN_ON_ONCE(mmap_read_trylock(desc->mm));
+		VM_WARN_ON_ONCE(vma_desc_test(desc, VMA_PFNMAP_BIT));
+		vma_desc_set_flags(desc, VMA_MIXEDMAP_BIT);
+	}
+
+	nr_pages = action->map_kernel.nr_pages;
+	end = addr + PAGE_SIZE * nr_pages;
+	if (!range_in_vma_desc(desc, addr, end))
+		return -EFAULT;
+
+	return 0;
+}
+EXPORT_SYMBOL(map_kernel_pages_prepare);
+
+int map_kernel_pages_complete(struct vm_area_struct *vma,
+			      struct mmap_action *action)
+{
+	unsigned long nr_pages;
+
+	nr_pages = action->map_kernel.nr_pages;
+	return insert_pages(vma, action->map_kernel.start,
+			    action->map_kernel.pages,
+			    &nr_pages, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(map_kernel_pages_complete);
+
 /**
  * vm_insert_page - insert single page into user vma
  * @vma: user vma to map to
diff --git a/mm/util.c b/mm/util.c
index e739d7c0311c..7934e303b230 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1445,6 +1445,8 @@ int mmap_action_prepare(struct vm_area_desc *desc,
 		return io_remap_pfn_range_prepare(desc, action);
 	case MMAP_SIMPLE_IO_REMAP:
 		return simple_ioremap_prepare(desc, action);
+	case MMAP_MAP_KERNEL_PAGES:
+		return map_kernel_pages_prepare(desc, action);
 	}
 }
 EXPORT_SYMBOL(mmap_action_prepare);
@@ -1473,6 +1475,9 @@ int mmap_action_complete(struct vm_area_struct *vma,
 	case MMAP_IO_REMAP_PFN:
 		err = io_remap_pfn_range_complete(vma, action);
 		break;
+	case MMAP_MAP_KERNEL_PAGES:
+		err = map_kernel_pages_complete(vma, action);
+		break;
 	case MMAP_SIMPLE_IO_REMAP:
 		/*
 		 * The simple I/O remap should have been delegated to an I/O
@@ -1496,6 +1501,7 @@ int mmap_action_prepare(struct vm_area_desc *desc,
 	case MMAP_REMAP_PFN:
 	case MMAP_IO_REMAP_PFN:
 	case MMAP_SIMPLE_IO_REMAP:
+	case MMAP_MAP_KERNEL_PAGES:
 		WARN_ON_ONCE(1); /* nommu cannot handle these. */
 		break;
 	}
diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h
index 4f2c9bb6b1ea..50ef2f62150d 100644
--- a/tools/testing/vma/include/dup.h
+++ b/tools/testing/vma/include/dup.h
@@ -425,6 +425,7 @@ enum mmap_action_type {
 	MMAP_REMAP_PFN,		/* Remap PFN range. */
 	MMAP_IO_REMAP_PFN,	/* I/O remap PFN range. */
 	MMAP_SIMPLE_IO_REMAP,	/* I/O remap with guardrails. */
+	MMAP_MAP_KERNEL_PAGES,	/* Map kernel page range from an array. */
 };
 
 /*
@@ -443,6 +444,12 @@ struct mmap_action {
 			phys_addr_t start;
 			unsigned long len;
 		} simple_ioremap;
+		struct {
+			unsigned long start;
+			struct page **pages;
+			unsigned long num;
+			pgoff_t pgoff;
+		} map_kernel;
 	};
 	enum mmap_action_type type;
 
-- 
2.53.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox