From: Simona Vetter <simona.vetter@ffwll.ch>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jaya Kumar <jayakumar.lkml@gmail.com>,
Simona Vetter <simona@ffwll.ch>, Helge Deller <deller@gmx.de>,
linux-fbdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Matthew Wilcox <willy@infradead.org>,
David Hildenbrand <david@redhat.com>,
Kajtar Zsolt <soci@c64.rulez.org>,
Maira Canal <mcanal@igalia.com>
Subject: Re: [PATCH 2/3] mm: provide mapping_wrprotect_page() function
Date: Tue, 4 Feb 2025 11:19:35 +0100 [thread overview]
Message-ID: <Z6Hptwe_Ugo9Qwl8@phenom.ffwll.local> (raw)
In-Reply-To: <655f318b-d883-4ddd-9301-53a05ab06bc0@lucifer.local>
On Mon, Feb 03, 2025 at 04:30:04PM +0000, Lorenzo Stoakes wrote:
> On Mon, Feb 03, 2025 at 04:49:34PM +0100, Simona Vetter wrote:
> > On Fri, Jan 31, 2025 at 06:28:57PM +0000, Lorenzo Stoakes wrote:
> > > in the fb_defio video driver, page dirty state is used to determine when
> > > frame buffer pages have been changed, allowing for batched, deferred I/O to
> > > be performed for efficiency.
> > >
> > > This implementation had only one means of doing so effectively - the use of
> > > the folio_mkclean() function.
> > >
> > > However, this use of the function is inappropriate, as the fb_defio
> > > implementation allocates kernel memory to back the framebuffer, and then is
> > > forced to specified page->index, mapping fields in order to permit the
> > > folio_mkclean() rmap traversal to proceed correctly.
> > >
> > > It is not correct to specify these fields on kernel-allocated memory, and
> > > moreover since these are not folios, page->index, mapping are deprecated
> > > fields, soon to be removed.
> > >
> > > We therefore need to provide a means by which we can correctly traverse the
> > > reverse mapping and write-protect mappings for a page backing an
> > > address_space page cache object at a given offset.
> > >
> > > This patch provides this - mapping_wrprotect_page() allows for this
> > > operation to be performed for a specified address_space, offset and page,
> > > without requiring a folio nor, of course, an inappropriate use of
> > > page->index, mapping.
> > >
> > > With this provided, we can subequently adjust the fb_defio implementation
> > > to make use of this function and avoid incorrect invocation of
> > > folio_mkclean() and more importantly, incorrect manipulation of
> > > page->index, mapping fields.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > ---
> > > include/linux/rmap.h | 3 ++
> > > mm/rmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 76 insertions(+)
> > >
> > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > > index 683a04088f3f..0bf5f64884df 100644
> > > --- a/include/linux/rmap.h
> > > +++ b/include/linux/rmap.h
> > > @@ -739,6 +739,9 @@ unsigned long page_address_in_vma(const struct folio *folio,
> > > */
> > > int folio_mkclean(struct folio *);
> > >
> > > +int mapping_wrprotect_page(struct address_space *mapping, pgoff_t pgoff,
> > > + unsigned long nr_pages, struct page *page);
> > > +
> > > int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff,
> > > struct vm_area_struct *vma);
> > >
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index a2ff20c2eccd..bb5a42d95c48 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -1127,6 +1127,79 @@ int folio_mkclean(struct folio *folio)
> > > }
> > > EXPORT_SYMBOL_GPL(folio_mkclean);
> > >
> > > +struct wrprotect_file_state {
> > > + int cleaned;
> > > + pgoff_t pgoff;
> > > + unsigned long pfn;
> > > + unsigned long nr_pages;
> > > +};
> > > +
> > > +static bool mapping_wrprotect_page_one(struct folio *folio,
> > > + struct vm_area_struct *vma, unsigned long address, void *arg)
> > > +{
> > > + struct wrprotect_file_state *state = (struct wrprotect_file_state *)arg;
> > > + struct page_vma_mapped_walk pvmw = {
> > > + .pfn = state->pfn,
> > > + .nr_pages = state->nr_pages,
> > > + .pgoff = state->pgoff,
> > > + .vma = vma,
> > > + .address = address,
> > > + .flags = PVMW_SYNC,
> > > + };
> > > +
> > > + state->cleaned += page_vma_mkclean_one(&pvmw);
> > > +
> > > + return true;
> > > +}
> > > +
> > > +static void __rmap_walk_file(struct folio *folio, struct address_space *mapping,
> > > + pgoff_t pgoff_start, unsigned long nr_pages,
> > > + struct rmap_walk_control *rwc, bool locked);
> > > +
> > > +/**
> > > + * mapping_wrprotect_page() - Write protect all mappings of this page.
> > > + *
> > > + * @mapping: The mapping whose reverse mapping should be traversed.
> > > + * @pgoff: The page offset at which @page is mapped within @mapping.
> > > + * @nr_pages: The number of physically contiguous base pages spanned.
> > > + * @page: The page mapped in @mapping at @pgoff.
> > > + *
> > > + * Traverses the reverse mapping, finding all VMAs which contain a shared
> > > + * mapping of the single @page in @mapping at offset @pgoff and write-protecting
> > > + * the mappings.
> > > + *
> > > + * The page does not have to be a folio, but rather can be a kernel allocation
> > > + * that is mapped into userland. We therefore do not require that the page maps
> > > + * to a folio with a valid mapping or index field, rather these are specified in
> > > + * @mapping and @pgoff.
> > > + *
> > > + * Return: the number of write-protected PTEs, or an error.
> > > + */
> > > +int mapping_wrprotect_page(struct address_space *mapping, pgoff_t pgoff,
> > > + unsigned long nr_pages, struct page *page)
> > > +{
> > > + struct wrprotect_file_state state = {
> > > + .cleaned = 0,
> > > + .pgoff = pgoff,
> > > + .pfn = page_to_pfn(page),
> >
> > Could we go one step further and entirely drop the struct page? Similar to
> > unmap_mapping_range for VM_SPECIAL mappings, except it only updates the
> > write protection. The reason is that ideally we'd like fbdev defio to
> > entirely get rid of any struct page usage, because with some dma_alloc()
> > memory regions there's simply no struct page for them (it's a carveout).
> > See e.g. Sa498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if
> > necessary") for some of the pain this has caused.
> >
> > So entirely struct page less way to write protect a pfn would be best. And
> > it doesn't look like you need the page here at all?
>
> In the original version [1] we did indeed take a PFN, so this shouldn't be
> a problem to change.
>
> Since we make it possible here to explicitly reference the address_space
> object mapping the range, and from that can find all the VMAs that map the
> page range [pgoff, pgoff + nr_pages), I don't think we do need to think
> about a struct page here at all.
>
> The defio code does seem to have some questionable assumptions in place, or
> at least ones I couldn't explain away re: attempting to folio-lock (the
> non-folios...), so there'd need to be changes on that side, which I suggest
> would probably be best for a follow-up series given this one's urgency.
Yeah there's a bunch more things we need to do to get there. It was the
lack of a pfn-based core mm function that stopped us from doing that thus
far, plus also fbdev defio being very low priority. But it would
definitely avoid a bunch of corner cases and duplication in fbdev
emulation code in drivers/gpu/drm.
> But I'm more than happy to make this interface work with that by doing
> another revision where we export PFN only, I think something like:
>
> int mapping_wrprotect_range(struct address_space *mapping, pgoff_t pgoff,
> unsigned long pfn, unsigned long nr_pages);
>
> Should work?
>
> [1]:https://lore.kernel.org/all/cover.1736352361.git.lorenzo.stoakes@oracle.com/
Yup that looks like the thing we'll need to wean defio of all that
questionable folio/page wrangling. But like you say, should be easy to
add/update when we get there.
Thanks, Sima
>
> >
> > Cheers, Sima
>
> Thanks!
>
> >
> >
> > > + .nr_pages = nr_pages,
> > > + };
> > > + struct rmap_walk_control rwc = {
> > > + .arg = (void *)&state,
> > > + .rmap_one = mapping_wrprotect_page_one,
> > > + .invalid_vma = invalid_mkclean_vma,
> > > + };
> > > +
> > > + if (!mapping)
> > > + return 0;
> > > +
> > > + __rmap_walk_file(/* folio = */NULL, mapping, pgoff, nr_pages, &rwc,
> > > + /* locked = */false);
> > > +
> > > + return state.cleaned;
> > > +}
> > > +EXPORT_SYMBOL_GPL(mapping_wrprotect_page);
> > > +
> > > /**
> > > * pfn_mkclean_range - Cleans the PTEs (including PMDs) mapped with range of
> > > * [@pfn, @pfn + @nr_pages) at the specific offset (@pgoff)
> > > --
> > > 2.48.1
> > >
> >
> > --
> > Simona Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
next prev parent reply other threads:[~2025-02-04 10:19 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-31 18:28 [PATCH 0/3] expose mapping wrprotect, fix fb_defio use Lorenzo Stoakes
2025-01-31 18:28 ` [PATCH 1/3] mm: refactor rmap_walk_file() to separate out traversal logic Lorenzo Stoakes
2025-01-31 18:28 ` [PATCH 2/3] mm: provide mapping_wrprotect_page() function Lorenzo Stoakes
2025-02-03 15:49 ` Simona Vetter
2025-02-03 16:30 ` Lorenzo Stoakes
2025-02-04 10:19 ` Simona Vetter [this message]
2025-02-04 5:36 ` Christoph Hellwig
2025-02-04 8:16 ` Thomas Zimmermann
2025-01-31 18:28 ` [PATCH 3/3] fb_defio: do not use deprecated page->mapping, index fields Lorenzo Stoakes
2025-02-01 17:06 ` Lorenzo Stoakes
2025-02-04 8:21 ` Thomas Zimmermann
2025-02-04 8:37 ` Lorenzo Stoakes
2025-02-04 8:57 ` Thomas Zimmermann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6Hptwe_Ugo9Qwl8@phenom.ffwll.local \
--to=simona.vetter@ffwll.ch \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=jayakumar.lkml@gmail.com \
--cc=linux-fbdev@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mcanal@igalia.com \
--cc=simona@ffwll.ch \
--cc=soci@c64.rulez.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).