From: Alistair Popple <apopple@nvidia.com>
To: <akpm@linux-foundation.org>, <Felix.Kuehling@amd.com>,
<linux-mm@kvack.org>, <rcampbell@nvidia.com>,
<linux-ext4@vger.kernel.org>, <linux-xfs@vger.kernel.org>,
Alex Sierra <alex.sierra@amd.com>
Cc: <amd-gfx@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>, <hch@lst.de>, <jgg@nvidia.com>,
<jglisse@redhat.com>, <willy@infradead.org>
Subject: Re: [PATCH v3 01/10] mm: add zone device coherent type memory support
Date: Thu, 20 Jan 2022 15:08:56 +1100 [thread overview]
Message-ID: <29335901.2ltahdNQJ0@nvdebian> (raw)
In-Reply-To: <20220110223201.31024-2-alex.sierra@amd.com>
On Tuesday, 11 January 2022 9:31:52 AM AEDT Alex Sierra wrote:
> Device memory that is cache coherent from device and CPU point of view.
> This is used on platforms that have an advanced system bus (like CAPI
> or CXL). Any page of a process can be migrated to such memory. However,
> no one should be allowed to pin such memory so that it can always be
> evicted.
>
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> include/linux/memremap.h | 8 ++++++++
> include/linux/mm.h | 16 ++++++++++++++++
> mm/memcontrol.c | 6 +++---
> mm/memory-failure.c | 8 ++++++--
> mm/memremap.c | 5 ++++-
> mm/migrate.c | 21 +++++++++++++--------
> 6 files changed, 50 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index c0e9d35889e8..ff4d398edf35 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -39,6 +39,13 @@ struct vmem_altmap {
> * A more complete discussion of unaddressable memory may be found in
> * include/linux/hmm.h and Documentation/vm/hmm.rst.
> *
> + * MEMORY_DEVICE_COHERENT:
> + * Device memory that is cache coherent from device and CPU point of view. This
> + * is used on platforms that have an advanced system bus (like CAPI or CXL). A
> + * driver can hotplug the device memory using ZONE_DEVICE and with that memory
> + * type. Any page of a process can be migrated to such memory. However no one
> + * should be allowed to pin such memory so that it can always be evicted.
> + *
> * MEMORY_DEVICE_FS_DAX:
> * Host memory that has similar access semantics as System RAM i.e. DMA
> * coherent and supports page pinning. In support of coordinating page
> @@ -59,6 +66,7 @@ struct vmem_altmap {
> enum memory_type {
> /* 0 is reserved to catch uninitialized type fields */
> MEMORY_DEVICE_PRIVATE = 1,
> + MEMORY_DEVICE_COHERENT,
> MEMORY_DEVICE_FS_DAX,
> MEMORY_DEVICE_GENERIC,
> MEMORY_DEVICE_PCI_P2PDMA,
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..fcf96c0fc918 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1162,6 +1162,7 @@ static inline bool page_is_devmap_managed(struct page *page)
> return false;
> switch (page->pgmap->type) {
> case MEMORY_DEVICE_PRIVATE:
> + case MEMORY_DEVICE_COHERENT:
> case MEMORY_DEVICE_FS_DAX:
> return true;
> default:
> @@ -1191,6 +1192,21 @@ static inline bool is_device_private_page(const struct page *page)
> page->pgmap->type == MEMORY_DEVICE_PRIVATE;
> }
>
> +static inline bool is_device_coherent_page(const struct page *page)
> +{
> + return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
> + is_zone_device_page(page) &&
> + page->pgmap->type == MEMORY_DEVICE_COHERENT;
> +}
> +
> +static inline bool is_device_page(const struct page *page)
I wish we could think of a better name for this - it's too similar to
is_zone_device_page() so I can never remember if it includes FS_DAX pages or
not. Unfortunately I don't have any better suggestions though.
> +{
> + return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
> + is_zone_device_page(page) &&
> + (page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
> + page->pgmap->type == MEMORY_DEVICE_COHERENT);
> +}
> +
> static inline bool is_pci_p2pdma_page(const struct page *page)
> {
> return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6da5020a8656..d0bab0747c73 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5695,8 +5695,8 @@ static int mem_cgroup_move_account(struct page *page,
> * 2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
> * target for charge migration. if @target is not NULL, the entry is stored
> * in target->ent.
> - * 3(MC_TARGET_DEVICE): like MC_TARGET_PAGE but page is MEMORY_DEVICE_PRIVATE
> - * (so ZONE_DEVICE page and thus not on the lru).
> + * 3(MC_TARGET_DEVICE): like MC_TARGET_PAGE but page is device memory and
> + * thus not on the lru.
> * For now we such page is charge like a regular page would be as for all
> * intent and purposes it is just special memory taking the place of a
> * regular page.
> @@ -5730,7 +5730,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
> */
> if (page_memcg(page) == mc.from) {
> ret = MC_TARGET_PAGE;
> - if (is_device_private_page(page))
> + if (is_device_page(page))
> ret = MC_TARGET_DEVICE;
> if (target)
> target->page = page;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 3e6449f2102a..4cf212e5f432 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1554,12 +1554,16 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> goto unlock;
> }
>
> - if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> + switch (pgmap->type) {
> + case MEMORY_DEVICE_PRIVATE:
> + case MEMORY_DEVICE_COHERENT:
> /*
> - * TODO: Handle HMM pages which may need coordination
> + * TODO: Handle device pages which may need coordination
> * with device-side memory.
> */
> goto unlock;
> + default:
> + break;
> }
>
> /*
> diff --git a/mm/memremap.c b/mm/memremap.c
> index ed593bf87109..94d6a1e01d42 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -44,6 +44,7 @@ EXPORT_SYMBOL(devmap_managed_key);
> static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
> {
> if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> + pgmap->type == MEMORY_DEVICE_COHERENT ||
> pgmap->type == MEMORY_DEVICE_FS_DAX)
> static_branch_dec(&devmap_managed_key);
> }
> @@ -51,6 +52,7 @@ static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
> static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
> {
> if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> + pgmap->type == MEMORY_DEVICE_COHERENT ||
> pgmap->type == MEMORY_DEVICE_FS_DAX)
> static_branch_inc(&devmap_managed_key);
> }
> @@ -328,6 +330,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>
> switch (pgmap->type) {
> case MEMORY_DEVICE_PRIVATE:
> + case MEMORY_DEVICE_COHERENT:
> if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
This will fail if device private support isn't enabled. Does device coherent
need to depend on that?
> WARN(1, "Device private memory not supported\n");
> return ERR_PTR(-EINVAL);
Also there is this check further down:
if (!pgmap->ops || !pgmap->ops->migrate_to_ram) {
Device coherent pages don't use the migrate_to_ram() callback, so this should
check for migrate_to_ram == NULL in that case.
WARN(1, "Missing migrate_to_ram method\n");
return ERR_PTR(-EINVAL);
}
> @@ -498,7 +501,7 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap);
> void free_devmap_managed_page(struct page *page)
> {
> /* notify page idle for dax */
> - if (!is_device_private_page(page)) {
> + if (!is_device_page(page)) {
> wake_up_var(&page->_refcount);
> return;
> }
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 1852d787e6ab..91018880dc7f 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -362,7 +362,7 @@ static int expected_page_refs(struct address_space *mapping, struct page *page)
> * Device private pages have an extra refcount as they are
> * ZONE_DEVICE pages.
> */
> - expected_count += is_device_private_page(page);
> + expected_count += is_device_page(page);
> if (mapping)
> expected_count += thp_nr_pages(page) + page_has_private(page);
>
> @@ -2503,7 +2503,7 @@ static bool migrate_vma_check_page(struct page *page)
> * FIXME proper solution is to rework migration_entry_wait() so
> * it does not need to take a reference on page.
> */
> - return is_device_private_page(page);
> + return is_device_page(page);
> }
> /* For file back page */
> @@ -2791,7 +2791,7 @@ EXPORT_SYMBOL(migrate_vma_setup);
> * handle_pte_fault()
> * do_anonymous_page()
> * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE
> - * private page.
> + * private or coherent page.
> */
> static void migrate_vma_insert_page(struct migrate_vma *migrate,
> unsigned long addr,
> @@ -2867,10 +2867,15 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
> swp_entry = make_readable_device_private_entry(
> page_to_pfn(page));
> entry = swp_entry_to_pte(swp_entry);
> + } else if (is_device_coherent_page(page)) {
> + entry = pte_mkold(mk_pte(page,
> + READ_ONCE(vma->vm_page_prot)));
> + if (vma->vm_flags & VM_WRITE)
> + entry = pte_mkwrite(pte_mkdirty(entry));
As I understand things device coherent pages use normal PTEs, so it would be
good if you could consolidate this to use the same code path as for normal
pages.
> } else {
> /*
> - * For now we only support migrating to un-addressable
> - * device memory.
> + * We support migrating to private and coherent types
> + * for device zone memory.
> */
> pr_warn_once("Unsupported ZONE_DEVICE page type.\n");
> goto abort;
> @@ -2976,10 +2981,10 @@ void migrate_vma_pages(struct migrate_vma *migrate)
> mapping = page_mapping(page);
>
> if (is_zone_device_page(newpage)) {
> - if (is_device_private_page(newpage)) {
> + if (is_device_page(newpage)) {
> /*
> - * For now only support private anonymous when
> - * migrating to un-addressable device memory.
> + * For now only support private and coherent
> + * anonymous when migrating to device memory.
> */
> if (mapping) {
> migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
>
next prev parent reply other threads:[~2022-01-20 4:09 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
2022-01-10 22:31 ` [PATCH v3 01/10] mm: add zone device coherent type memory support Alex Sierra
2022-01-20 4:08 ` Alistair Popple [this message]
2022-01-10 22:31 ` [PATCH v3 02/10] mm: add device coherent vma selection for memory migration Alex Sierra
2022-01-10 22:31 ` [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type Alex Sierra
2022-01-20 12:36 ` Joao Martins
2022-01-20 13:18 ` Alistair Popple
2022-01-10 22:31 ` [PATCH v3 04/10] drm/amdkfd: add SPM support for SVM Alex Sierra
2022-01-10 22:31 ` [PATCH v3 05/10] drm/amdkfd: coherent type as sys mem on migration to ram Alex Sierra
2022-01-10 22:31 ` [PATCH v3 06/10] lib: test_hmm add ioctl to get zone device type Alex Sierra
2022-01-20 5:01 ` Alistair Popple
2022-01-10 22:31 ` [PATCH v3 07/10] lib: test_hmm add module param for " Alex Sierra
2022-01-20 5:23 ` Alistair Popple
2022-01-10 22:31 ` [PATCH v3 08/10] lib: add support for device coherent type in test_hmm Alex Sierra
2022-01-20 6:00 ` Alistair Popple
2022-01-10 22:32 ` [PATCH v3 09/10] tools: update hmm-test to support device coherent type Alex Sierra
2022-01-20 6:14 ` Alistair Popple
2022-01-27 3:22 ` Sierra Guiza, Alejandro (Alex)
2022-01-10 22:32 ` [PATCH v3 10/10] tools: update test_hmm script to support SP config Alex Sierra
2022-01-20 6:17 ` Alistair Popple
2022-01-12 11:06 ` [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alistair Popple
2022-01-20 6:33 ` Alistair Popple
2022-01-12 11:16 ` David Hildenbrand
2022-01-12 16:08 ` Felix Kuehling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29335901.2ltahdNQJ0@nvdebian \
--to=apopple@nvidia.com \
--cc=Felix.Kuehling@amd.com \
--cc=akpm@linux-foundation.org \
--cc=alex.sierra@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=hch@lst.de \
--cc=jgg@nvidia.com \
--cc=jglisse@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=rcampbell@nvidia.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).