From: Jason Gunthorpe <jgg@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: akpm@linux-foundation.org, Jan Kara <jack@suse.cz>,
Christoph Hellwig <hch@lst.de>,
"Darrick J. Wong" <djwong@kernel.org>,
John Hubbard <jhubbard@nvidia.com>,
Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org, nvdimm@lists.linux.dev,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 00/13] Fix the DAX-gup mistake
Date: Tue, 6 Sep 2022 10:05:29 -0300 [thread overview]
Message-ID: <YxdFmXi/Zdr8Zi3q@nvidia.com> (raw)
In-Reply-To: <166225775968.2351842.11156458342486082012.stgit@dwillia2-xfh.jf.intel.com>
On Sat, Sep 03, 2022 at 07:16:00PM -0700, Dan Williams wrote:
> tl;dr: Move the pin of 'struct dev_pagemap' instances from gup-time to
> map time, move the unpin of 'struct dev_pagemap' to truncate_inode_pages()
> for fsdax and devdax inodes, and use page_maybe_dma_pinned() to
> determine when filesystems can safely truncate DAX mappings vs DMA.
>
> The longer story is that DAX has caused friction with folio development
> and other device-memory use cases due to its hack of using a
> page-reference count of 1 to indicate that the page is DMA idle. That
> situation arose from the mistake of not managing DAX page reference
> counts at map time. The lack of page reference counting at map time grew
> organically from the original DAX experiment of attempting to manage DAX
> mappings without page structures. The page lock, dirty tracking and
> other entry management was supported sans pages. However, the page
> support was then bolted on incrementally so solve problems with gup,
> memory-failure, and all the other kernel services that are missing when
> a pfn does not have an associated page structure.
>
> Since then John has led an effort to account for when a page is pinned
> for DMA vs other sources that elevate the reference count. The
> page_maybe_dma_pinned() helper slots in seamlessly to replace the need
> to track transitions to page->_refount == 1.
>
> The larger change in this set comes from Jason's observation that
> inserting DAX mappings without any reference taken is a bug. So
> dax_insert_entry(), that fsdax uses, is updated to take 'struct
> dev_pagemap' references, and devdax is updated to reuse the same.
It wasn't pagemap references that were the problem, it was struct page
references.
pagemap is just something that should be ref'd in the background, as
long as a struct page has a positive reference the pagemap should be
considered referenced, IMHO free_zone_device_page() should be dealing
with this - put the pagemap after calling page_free().
Pagemap is protecting page->pgmap from UAF so we must ensure we hold
it when we do pgmap->ops
That should be the only put, and it should pair with the only get
which happens when the driver takes a 0 refcount page out of its free
list and makes it have a refcount of 1.
> page mapping helpers. One of the immediate hurdles is the usage of
> pmd_devmap() to distinguish large page mappings that are not transparent
> huge pages.
And this is because the struct page refcounting is not right :|
I had thought the progression would be to make fsdax use compound
folios, install compound folios in the PMD, remove all the special
case refcounting for DAX from the pagetable code, then address the
pgmap issue from the basis of working page->refcount, eg by putting a
pgmap put in right after the op->page_free call.
Can we continue to have the weird page->refcount behavior and still
change the other things?
Jason
next prev parent reply other threads:[~2022-09-06 13:05 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-04 2:16 [PATCH 00/13] Fix the DAX-gup mistake Dan Williams
2022-09-04 2:16 ` [PATCH 01/13] fsdax: Rename "busy page" to "pinned page" Dan Williams
2022-09-04 2:16 ` [PATCH 02/13] fsdax: Use page_maybe_dma_pinned() for DAX vs DMA collisions Dan Williams
2022-09-06 12:07 ` Jason Gunthorpe
2022-09-04 2:16 ` [PATCH 03/13] fsdax: Delete put_devmap_managed_page_refs() Dan Williams
2022-09-04 2:16 ` [PATCH 04/13] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-09-04 2:16 ` [PATCH 05/13] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-09-04 2:16 ` [PATCH 06/13] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-09-04 2:16 ` [PATCH 07/13] fsdax: Manage pgmap references at entry insertion and deletion Dan Williams
2022-09-06 12:30 ` Jason Gunthorpe
2022-09-04 2:16 ` [PATCH 08/13] devdax: Minor warning fixups Dan Williams
2022-09-04 2:16 ` [PATCH 09/13] devdax: Move address_space helpers to the DAX core Dan Williams
2022-09-04 2:16 ` [PATCH 10/13] dax: Prep dax_{associate, disassociate}_entry() for compound pages Dan Williams
2022-09-04 2:17 ` [PATCH 11/13] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-09-04 2:17 ` [PATCH 12/13] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-09-04 2:17 ` [PATCH 13/13] mm/gup: Drop DAX pgmap accounting Dan Williams
2022-09-06 13:05 ` Jason Gunthorpe [this message]
2022-09-06 17:23 ` [PATCH 00/13] Fix the DAX-gup mistake Dan Williams
2022-09-06 17:29 ` Jason Gunthorpe
2022-09-06 18:37 ` Dan Williams
2022-09-06 18:49 ` Jason Gunthorpe
2022-09-06 19:41 ` Dan Williams
2022-09-07 0:54 ` Dan Williams
2022-09-07 12:58 ` Jason Gunthorpe
2022-09-07 17:10 ` Dan Williams
2022-09-07 18:43 ` Dan Williams
2022-09-07 19:30 ` Jason Gunthorpe
2022-09-07 20:45 ` Dan Williams
2022-09-08 18:49 ` Jason Gunthorpe
2022-09-08 19:27 ` Dan Williams
2022-09-09 11:53 ` Jason Gunthorpe
2022-09-09 17:52 ` Dan Williams
2022-09-09 18:11 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YxdFmXi/Zdr8Zi3q@nvidia.com \
--to=jgg@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jhubbard@nvidia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nvdimm@lists.linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).