From: Dan Williams <dan.j.williams@intel.com>
To: Alistair Popple <apopple@nvidia.com>, <dan.j.williams@intel.com>,
<linux-mm@kvack.org>
Cc: Alistair Popple <apopple@nvidia.com>, <vishal.l.verma@intel.com>,
<dave.jiang@intel.com>, <logang@deltatee.com>,
<bhelgaas@google.com>, <jack@suse.cz>, <jgg@ziepe.ca>,
<catalin.marinas@arm.com>, <will@kernel.org>,
<mpe@ellerman.id.au>, <npiggin@gmail.com>,
<dave.hansen@linux.intel.com>, <ira.weiny@intel.com>,
<willy@infradead.org>, <djwong@kernel.org>, <tytso@mit.edu>,
<linmiaohe@huawei.com>, <david@redhat.com>, <peterx@redhat.com>,
<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>,
<linuxppc-dev@lists.ozlabs.org>, <nvdimm@lists.linux.dev>,
<linux-cxl@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
<linux-ext4@vger.kernel.org>, <linux-xfs@vger.kernel.org>,
<jhubbard@nvidia.com>, <hch@lst.de>, <david@fromorbit.com>,
Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH 04/12] mm: Allow compound zone device pages
Date: Sun, 22 Sep 2024 03:01:18 +0200 [thread overview]
Message-ID: <66ef6c5ebd068_109ae294a3@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <c7026449473790e2844bb82012216c57047c7639.1725941415.git-series.apopple@nvidia.com>
Alistair Popple wrote:
> Zone device pages are used to represent various type of device memory
> managed by device drivers. Currently compound zone device pages are
> not supported. This is because MEMORY_DEVICE_FS_DAX pages are the only
> user of higher order zone device pages and have their own page
> reference counting.
>
> A future change will unify FS DAX reference counting with normal page
> reference counting rules and remove the special FS DAX reference
> counting. Supporting that requires compound zone device pages.
>
> Supporting compound zone device pages requires compound_head() to
> distinguish between head and tail pages whilst still preserving the
> special struct page fields that are specific to zone device pages.
>
> A tail page is distinguished by having bit zero being set in
> page->compound_head, with the remaining bits pointing to the head
> page. For zone device pages page->compound_head is shared with
> page->pgmap.
>
> The page->pgmap field is common to all pages within a memory section.
> Therefore pgmap is the same for both head and tail pages and can be
> moved into the folio and we can use the standard scheme to find
> compound_head from a tail page.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>
> ---
>
> Changes since v1:
>
> - Move pgmap to the folio as suggested by Matthew Wilcox
> ---
> drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 ++-
> drivers/pci/p2pdma.c | 6 +++---
> include/linux/memremap.h | 6 +++---
> include/linux/migrate.h | 4 ++--
> include/linux/mm_types.h | 9 +++++++--
> include/linux/mmzone.h | 8 +++++++-
> lib/test_hmm.c | 3 ++-
> mm/hmm.c | 2 +-
> mm/memory.c | 4 +++-
> mm/memremap.c | 14 +++++++-------
> mm/migrate_device.c | 7 +++++--
> mm/mm_init.c | 2 +-
> 12 files changed, 43 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 6fb65b0..58d308c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -88,7 +88,8 @@ struct nouveau_dmem {
>
> static struct nouveau_dmem_chunk *nouveau_page_to_chunk(struct page *page)
> {
> - return container_of(page->pgmap, struct nouveau_dmem_chunk, pagemap);
> + return container_of(page_dev_pagemap(page), struct nouveau_dmem_chunk,
page_dev_pagemap() feels like a mouthful. I would be ok with
page_pgmap() since that is the most common idenifier for struct
struct dev_pagemap instances.
> + pagemap);
> }
>
> static struct nouveau_drm *page_to_drm(struct page *page)
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 210b9f4..a58f2c1 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -199,7 +199,7 @@ static const struct attribute_group p2pmem_group = {
>
> static void p2pdma_page_free(struct page *page)
> {
> - struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
> + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page_dev_pagemap(page));
> /* safe to dereference while a reference is held to the percpu ref */
> struct pci_p2pdma *p2pdma =
> rcu_dereference_protected(pgmap->provider->p2pdma, 1);
> @@ -1022,8 +1022,8 @@ enum pci_p2pdma_map_type
> pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
> struct scatterlist *sg)
> {
> - if (state->pgmap != sg_page(sg)->pgmap) {
> - state->pgmap = sg_page(sg)->pgmap;
> + if (state->pgmap != page_dev_pagemap(sg_page(sg))) {
> + state->pgmap = page_dev_pagemap(sg_page(sg));
> state->map = pci_p2pdma_map_type(state->pgmap, dev);
> state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
> }
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 3f7143a..14273e6 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -161,7 +161,7 @@ static inline bool is_device_private_page(const struct page *page)
> {
> return IS_ENABLED(CONFIG_DEVICE_PRIVATE) &&
> is_zone_device_page(page) &&
> - page->pgmap->type == MEMORY_DEVICE_PRIVATE;
> + page_dev_pagemap(page)->type == MEMORY_DEVICE_PRIVATE;
> }
>
> static inline bool folio_is_device_private(const struct folio *folio)
> @@ -173,13 +173,13 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
> {
> return IS_ENABLED(CONFIG_PCI_P2PDMA) &&
> is_zone_device_page(page) &&
> - page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA;
> + page_dev_pagemap(page)->type == MEMORY_DEVICE_PCI_P2PDMA;
> }
>
> static inline bool is_device_coherent_page(const struct page *page)
> {
> return is_zone_device_page(page) &&
> - page->pgmap->type == MEMORY_DEVICE_COHERENT;
> + page_dev_pagemap(page)->type == MEMORY_DEVICE_COHERENT;
> }
>
> static inline bool folio_is_device_coherent(const struct folio *folio)
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 002e49b..9a85a82 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -207,8 +207,8 @@ struct migrate_vma {
> unsigned long end;
>
> /*
> - * Set to the owner value also stored in page->pgmap->owner for
> - * migrating out of device private memory. The flags also need to
> + * Set to the owner value also stored in page_dev_pagemap(page)->owner
> + * for migrating out of device private memory. The flags also need to
> * be set to MIGRATE_VMA_SELECT_DEVICE_PRIVATE.
> * The caller should always set this field when using mmu notifier
> * callbacks to avoid device MMU invalidations for device private
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6e3bdf8..c2f1d53 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -129,8 +129,11 @@ struct page {
> unsigned long compound_head; /* Bit zero is set */
> };
> struct { /* ZONE_DEVICE pages */
> - /** @pgmap: Points to the hosting device page map. */
> - struct dev_pagemap *pgmap;
> + /*
> + * The first word is used for compound_head or folio
> + * pgmap
> + */
> + void *_unused;
I would feel better with "_unused_pgmap_compound_head", similar to how
_unused_slab_obj_exts in 'struct foio' indicates the placeholer
contents.
> void *zone_device_data;
> /*
> * ZONE_DEVICE private pages are counted as being
> @@ -299,6 +302,7 @@ typedef struct {
> * @_refcount: Do not access this member directly. Use folio_ref_count()
> * to find how many references there are to this folio.
> * @memcg_data: Memory Control Group data.
> + * @pgmap: Metadata for ZONE_DEVICE mappings
> * @virtual: Virtual address in the kernel direct map.
> * @_last_cpupid: IDs of last CPU and last process that accessed the folio.
> * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
> @@ -337,6 +341,7 @@ struct folio {
> /* private: */
> };
> /* public: */
> + struct dev_pagemap *pgmap;
> };
> struct address_space *mapping;
> pgoff_t index;
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 17506e4..e191434 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1134,6 +1134,12 @@ static inline bool is_zone_device_page(const struct page *page)
> return page_zonenum(page) == ZONE_DEVICE;
> }
>
> +static inline struct dev_pagemap *page_dev_pagemap(const struct page *page)
> +{
> + WARN_ON(!is_zone_device_page(page));
VM_WARN_ON()?
With the above fixups:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
next prev parent reply other threads:[~2024-09-22 1:01 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-10 4:14 [PATCH 00/12] fs/dax: Fix FS DAX page reference counts Alistair Popple
2024-09-10 4:14 ` [PATCH 01/12] mm/gup.c: Remove redundant check for PCI P2PDMA page Alistair Popple
2024-09-22 1:00 ` Dan Williams
2024-09-10 4:14 ` [PATCH 02/12] pci/p2pdma: Don't initialise page refcount to one Alistair Popple
2024-09-10 13:47 ` Bjorn Helgaas
2024-09-11 1:07 ` Alistair Popple
2024-09-11 13:51 ` Bjorn Helgaas
2024-09-11 0:48 ` Logan Gunthorpe
2024-10-11 0:20 ` Alistair Popple
2024-09-22 1:00 ` Dan Williams
2024-10-11 0:17 ` Alistair Popple
2024-09-10 4:14 ` [PATCH 03/12] fs/dax: Refactor wait for dax idle page Alistair Popple
2024-09-22 1:01 ` Dan Williams
2024-09-10 4:14 ` [PATCH 04/12] mm: Allow compound zone device pages Alistair Popple
2024-09-10 4:47 ` Matthew Wilcox
2024-09-10 6:57 ` Alistair Popple
2024-09-10 13:41 ` Matthew Wilcox
2024-09-12 12:44 ` kernel test robot
2024-09-12 12:44 ` kernel test robot
2024-09-22 1:01 ` Dan Williams [this message]
2024-09-10 4:14 ` [PATCH 05/12] mm/memory: Add dax_insert_pfn Alistair Popple
2024-09-22 1:41 ` Dan Williams
2024-10-01 10:43 ` Gerald Schaefer
2024-09-10 4:14 ` [PATCH 06/12] huge_memory: Allow mappings of PUD sized pages Alistair Popple
2024-09-22 2:07 ` Dan Williams
2024-10-14 6:33 ` Alistair Popple
2024-09-10 4:14 ` [PATCH 07/12] huge_memory: Allow mappings of PMD " Alistair Popple
2024-09-27 2:48 ` Dan Williams
2024-10-14 6:53 ` Alistair Popple
2024-10-23 23:14 ` Alistair Popple
2024-10-23 23:38 ` Dan Williams
2024-09-10 4:14 ` [PATCH 08/12] gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages Alistair Popple
2024-09-25 0:17 ` Dan Williams
2024-09-27 2:52 ` Dan Williams
2024-10-14 7:03 ` Alistair Popple
2024-09-10 4:14 ` [PATCH 09/12] mm: Update vm_normal_page() callers to accept " Alistair Popple
2024-09-27 7:15 ` Dan Williams
2024-10-14 7:16 ` Alistair Popple
2024-09-10 4:14 ` [PATCH 10/12] fs/dax: Properly refcount fs dax pages Alistair Popple
2024-09-27 7:59 ` Dan Williams
2024-10-24 7:52 ` Alistair Popple
2024-10-24 23:52 ` Dan Williams
2024-10-25 2:46 ` Alistair Popple
2024-10-25 4:35 ` Dan Williams
2024-10-28 4:24 ` Alistair Popple
2024-10-29 2:03 ` Dan Williams
2024-10-30 5:57 ` Alistair Popple
2024-09-10 4:14 ` [PATCH 11/12] mm: Remove pXX_devmap callers Alistair Popple
2024-09-27 12:29 ` Alexander Gordeev
2024-10-14 7:14 ` Alistair Popple
2024-09-10 4:14 ` [PATCH 12/12] mm: Remove devmap related functions and page table bits Alistair Popple
2024-09-11 7:47 ` Chunyan Zhang
2024-09-12 12:55 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66ef6c5ebd068_109ae294a3@dwillia2-mobl3.amr.corp.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=apopple@nvidia.com \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=linmiaohe@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=logang@deltatee.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=nvdimm@lists.linux.dev \
--cc=peterx@redhat.com \
--cc=tytso@mit.edu \
--cc=vishal.l.verma@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).