From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>, <linux-mm@kvack.org>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
"Darrick J. Wong" <djwong@kernel.org>,
Christoph Hellwig <hch@lst.de>,
John Hubbard <jhubbard@nvidia.com>, <david@fromorbit.com>,
<nvdimm@lists.linux.dev>, <akpm@linux-foundation.org>,
<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v3 10/25] fsdax: Introduce pgmap_request_folios()
Date: Mon, 17 Oct 2022 13:51:18 -0700 [thread overview]
Message-ID: <634dc046bada2_4da329484@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Y0229P6z0E9Niw+9@nvidia.com>
Jason Gunthorpe wrote:
> On Mon, Oct 17, 2022 at 01:06:25PM -0700, Dan Williams wrote:
> > Alistair Popple wrote:
> > >
> > > Dan Williams <dan.j.williams@intel.com> writes:
> > >
> > > [...]
> > >
> > > > +/**
> > > > + * pgmap_request_folios - activate an contiguous span of folios in @pgmap
> > > > + * @pgmap: host page map for the folio array
> > > > + * @folio: start of the folio list, all subsequent folios have same folio_size()
> > > > + *
> > > > + * Caller is responsible for @pgmap remaining live for the duration of
> > > > + * this call. Caller is also responsible for not racing requests for the
> > > > + * same folios.
> > > > + */
> > > > +bool pgmap_request_folios(struct dev_pagemap *pgmap, struct folio *folio,
> > > > + int nr_folios)
> > >
> > > All callers of pgmap_request_folios() I could see call this with
> > > nr_folios == 1 and pgmap == folio_pgmap(folio). Could we remove the
> > > redundant arguments and simplify it to
> > > pgmap_request_folios(struct folio *folio)?
> >
> > The rationale for the @pgmap argument being separate is to make clear
> > that the caller must assert that pgmap is alive before passing it to
> > pgmap_request_folios(), and that @pgmap is constant for the span of
> > folios requested.
>
> The api is kind of weird to take in a folio - it should take in the
> offset in bytes from the start of the pgmap and "create" usable
> non-free folios.
Plumbing that is non-trivial. Recall that in the DAX case the filesystem
can be sitting on top of a stack of device-mapper-dax devices routing to
more than one pgmap, and the pgmap can be made up of discontiguous
ranges. The filesystem knows the offset into the dax-device and then
asks that device to translate that to a pfn. From pfn the optional (see
CONFIG_FS_DAX_LIMITED) pgmap can be retrieved. Once DAX knows what pfn
it wants, while holding a lock that prevents the pgmap from entering its
dying state, it can request to pin that pfn.
I arrived at the interface being folio based because the order of the
pin(s) also needs to be conveyed and the order should be consistent.
> A good design is that nothing has a struct folio * unless it can also
> prove it has a non-zero refcount on it, and that is simply impossible
> for any caller at this point.
I agree that's a good design, and I think it is a bug to make this
request on anything but a zero-refcount folio, but plumbing pgmap
offsets to replace pfn_to_page() would require dax_direct_access() to
shift from a pfn lookup to a pgmap + offset lookup.
CONFIG_FS_DAX_LIMITED is something that needs to be deleted, but for now
it supports DAX without a pgmap, where "limited" is just meant for XIP
(eXecute-In-Place), no GUP support. Once that goes then
dax_direct_access() could evolve into an interface that assumes a pgmap
is present. Until then I think pgmap_request_folios() at least puts out
the immediate fire of making people contend with oddly refcounted DAX
pages.
next prev parent reply other threads:[~2022-10-17 20:51 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-14 23:56 [PATCH v3 00/25] Fix the DAX-gup mistake Dan Williams
2022-10-14 23:57 ` [PATCH v3 01/25] fsdax: Wait on @page not @page->_refcount Dan Williams
2022-10-14 23:57 ` [PATCH v3 02/25] fsdax: Use dax_page_idle() to document DAX busy page checking Dan Williams
2022-10-14 23:57 ` [PATCH v3 03/25] fsdax: Include unmapped inodes for page-idle detection Dan Williams
2022-10-14 23:57 ` [PATCH v3 04/25] fsdax: Introduce dax_zap_mappings() Dan Williams
2022-11-02 13:04 ` Aneesh Kumar K.V
2022-10-14 23:57 ` [PATCH v3 05/25] fsdax: Wait for pinned pages during truncate_inode_pages_final() Dan Williams
2022-10-14 23:57 ` [PATCH v3 06/25] fsdax: Validate DAX layouts broken before truncate Dan Williams
2022-10-14 23:57 ` [PATCH v3 07/25] fsdax: Hold dax lock over mapping insertion Dan Williams
2022-10-17 19:31 ` Jason Gunthorpe
2022-10-17 20:17 ` Dan Williams
2022-10-18 5:26 ` Christoph Hellwig
2022-10-18 17:30 ` Dan Williams
2022-10-14 23:57 ` [PATCH v3 08/25] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-10-14 23:57 ` [PATCH v3 09/25] fsdax: Rework for_each_mapped_pfn() to dax_for_each_folio() Dan Williams
2022-10-14 23:57 ` [PATCH v3 10/25] fsdax: Introduce pgmap_request_folios() Dan Williams
2022-10-17 6:31 ` Alistair Popple
2022-10-17 20:06 ` Dan Williams
2022-10-17 20:11 ` Jason Gunthorpe
2022-10-17 20:51 ` Dan Williams [this message]
2022-10-17 23:57 ` Jason Gunthorpe
2022-10-18 0:19 ` Dan Williams
2022-10-17 19:41 ` Jason Gunthorpe
2022-10-14 23:58 ` [PATCH v3 11/25] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-10-14 23:58 ` [PATCH v3 12/25] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-10-14 23:58 ` [PATCH v3 13/25] devdax: Minor warning fixups Dan Williams
2022-10-14 23:58 ` [PATCH v3 14/25] devdax: Fix sparse lock imbalance warning Dan Williams
2022-10-14 23:58 ` [PATCH v3 15/25] libnvdimm/pmem: Support pmem block devices without dax Dan Williams
2022-10-14 23:58 ` [PATCH v3 16/25] devdax: Move address_space helpers to the DAX core Dan Williams
2022-10-14 23:58 ` [PATCH v3 17/25] devdax: Sparse fixes for xarray locking Dan Williams
2022-10-14 23:58 ` [PATCH v3 18/25] devdax: Sparse fixes for vmfault_t / dax-entry conversions Dan Williams
2022-10-14 23:58 ` [PATCH v3 19/25] devdax: Sparse fixes for vm_fault_t in tracepoints Dan Williams
2022-10-14 23:58 ` [PATCH v3 20/25] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-10-14 23:59 ` [PATCH v3 21/25] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-10-14 23:59 ` [PATCH v3 22/25] mm/memremap_pages: Replace zone_device_page_init() with pgmap_request_folios() Dan Williams
2022-10-17 19:17 ` Lyude Paul
2022-10-14 23:59 ` [PATCH v3 23/25] mm/memremap_pages: Initialize all ZONE_DEVICE pages to start at refcount 0 Dan Williams
2022-10-17 7:04 ` Alistair Popple
2022-10-17 19:48 ` Jason Gunthorpe
2022-10-14 23:59 ` [PATCH v3 24/25] mm/meremap_pages: Delete put_devmap_managed_page_refs() Dan Williams
2022-10-17 7:08 ` Alistair Popple
2022-10-14 23:59 ` [PATCH v3 25/25] mm/gup: Drop DAX pgmap accounting Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=634dc046bada2_4da329484@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nvdimm@lists.linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).