From: Jerome Glisse <jglisse@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
David Nellans <dnellans@nvidia.com>,
Balbir Singh <bsingharora@gmail.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [HMM-v25 07/19] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v5
Date: Thu, 20 Dec 2018 11:15:38 -0500 [thread overview]
Message-ID: <20181220161538.GA3963@redhat.com> (raw)
In-Reply-To: <CAA9_cmeag7n4yeiP6pYZSz80KyxqfwbsXJCWvyNE4PSaxCKA3A@mail.gmail.com>
On Thu, Dec 20, 2018 at 12:33:47AM -0800, Dan Williams wrote:
> On Wed, Aug 16, 2017 at 5:06 PM J�r�me Glisse <jglisse@redhat.com> wrote:
> >
> > HMM (heterogeneous memory management) need struct page to support migration
> > from system main memory to device memory. Reasons for HMM and migration to
> > device memory is explained with HMM core patch.
> >
> > This patch deals with device memory that is un-addressable memory (ie CPU
> > can not access it). Hence we do not want those struct page to be manage
> > like regular memory. That is why we extend ZONE_DEVICE to support different
> > types of memory.
> >
> > A persistent memory type is define for existing user of ZONE_DEVICE and a
> > new device un-addressable type is added for the un-addressable memory type.
> > There is a clear separation between what is expected from each memory type
> > and existing user of ZONE_DEVICE are un-affected by new requirement and new
> > use of the un-addressable type. All specific code path are protect with
> > test against the memory type.
> >
> > Because memory is un-addressable we use a new special swap type for when
> > a page is migrated to device memory (this reduces the number of maximum
> > swap file).
> >
> > The main two additions beside memory type to ZONE_DEVICE is two callbacks.
> > First one, page_free() is call whenever page refcount reach 1 (which means
> > the page is free as ZONE_DEVICE page never reach a refcount of 0). This
> > allow device driver to manage its memory and associated struct page.
> >
> > The second callback page_fault() happens when there is a CPU access to
> > an address that is back by a device page (which are un-addressable by the
> > CPU). This callback is responsible to migrate the page back to system
> > main memory. Device driver can not block migration back to system memory,
> > HMM make sure that such page can not be pin into device memory.
> >
> > If device is in some error condition and can not migrate memory back then
> > a CPU page fault to device memory should end with SIGBUS.
> >
> > Changed since v4:
> > - s/DEVICE_PUBLIC/DEVICE_HOST (to free DEVICE_PUBLIC for HMM-CDM)
> > Changed since v3:
> > - fix comments that was still using UNADDRESSABLE as keyword
> > - kernel configuration simplification
> > Changed since v2:
> > - s/DEVICE_UNADDRESSABLE/DEVICE_PRIVATE
> > Changed since v1:
> > - rename to device private memory (from device unaddressable)
> >
> > Signed-off-by: J�r�me Glisse <jglisse@redhat.com>
> > Acked-by: Dan Williams <dan.j.williams@intel.com>
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> [..]
> > fs/proc/task_mmu.c | 7 +++++
> > include/linux/ioport.h | 1 +
> > include/linux/memremap.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/mm.h | 12 ++++++++
> > include/linux/swap.h | 24 ++++++++++++++--
> > include/linux/swapops.h | 68 ++++++++++++++++++++++++++++++++++++++++++++
> > kernel/memremap.c | 34 ++++++++++++++++++++++
> > mm/Kconfig | 11 +++++++-
> > mm/memory.c | 61 ++++++++++++++++++++++++++++++++++++++++
> > mm/memory_hotplug.c | 10 +++++--
> > mm/mprotect.c | 14 ++++++++++
> > 11 files changed, 309 insertions(+), 6 deletions(-)
> >
> [..]
> > diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> > index 93416196ba64..8e164ec9eed0 100644
> > --- a/include/linux/memremap.h
> > +++ b/include/linux/memremap.h
> > @@ -4,6 +4,8 @@
> > #include <linux/ioport.h>
> > #include <linux/percpu-refcount.h>
> >
> > +#include <asm/pgtable.h>
> > +
>
> So it turns out, over a year later, that this include was a mistake
> and makes the build fragile.
>
> > struct resource;
> > struct device;
> >
> [..]
> > +typedef int (*dev_page_fault_t)(struct vm_area_struct *vma,
> > + unsigned long addr,
> > + const struct page *page,
> > + unsigned int flags,
> > + pmd_t *pmdp);
>
> I recently included this file somewhere that did not have a pile of
> other mm headers included and 0day reports:
>
> In file included from arch/m68k/include/asm/pgtable_mm.h:148:0,
> from arch/m68k/include/asm/pgtable.h:5,
> from include/linux/memremap.h:7,
> from drivers//dax/bus.c:3:
> arch/m68k/include/asm/motorola_pgtable.h: In function 'pgd_offset':
> >> arch/m68k/include/asm/motorola_pgtable.h:199:11: error: dereferencing pointer to incomplete type 'const struct mm_struct'
> return mm->pgd + pgd_index(address);
> ^~
> I assume this pulls in the entirety of pgtable.h just to get the pmd_t
> definition?
>
> > +typedef void (*dev_page_free_t)(struct page *page, void *data);
> > +
> > /**
> > * struct dev_pagemap - metadata for ZONE_DEVICE mappings
> > + * @page_fault: callback when CPU fault on an unaddressable device page
> > + * @page_free: free page callback when page refcount reaches 1
> > * @altmap: pre-allocated/reserved memory for vmemmap allocations
> > * @res: physical address range covered by @ref
> > * @ref: reference count that pins the devm_memremap_pages() mapping
> > * @dev: host device of the mapping for debug
> > + * @data: private data pointer for page_free()
> > + * @type: memory type: see MEMORY_* in memory_hotplug.h
> > */
> > struct dev_pagemap {
> > + dev_page_fault_t page_fault;
>
> Rather than try to figure out how to forward declare pmd_t, how about
> just move dev_page_fault_t out of the generic dev_pagemap and into the
> HMM specific container structure? This should be straightfoward on top
> of the recent refactor.
Fine with me.
Cheers,
J�r�me
WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
David Nellans <dnellans@nvidia.com>,
Balbir Singh <bsingharora@gmail.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [HMM-v25 07/19] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v5
Date: Thu, 20 Dec 2018 11:15:38 -0500 [thread overview]
Message-ID: <20181220161538.GA3963@redhat.com> (raw)
Message-ID: <20181220161538.y5tG6ubKJLx7Zz9r1Ophz1_gkCx0CkjNjcLAlkyTY9I@z> (raw)
In-Reply-To: <CAA9_cmeag7n4yeiP6pYZSz80KyxqfwbsXJCWvyNE4PSaxCKA3A@mail.gmail.com>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 5762 bytes --]
On Thu, Dec 20, 2018 at 12:33:47AM -0800, Dan Williams wrote:
> On Wed, Aug 16, 2017 at 5:06 PM Jérôme Glisse <jglisse@redhat.com> wrote:
> >
> > HMM (heterogeneous memory management) need struct page to support migration
> > from system main memory to device memory. Reasons for HMM and migration to
> > device memory is explained with HMM core patch.
> >
> > This patch deals with device memory that is un-addressable memory (ie CPU
> > can not access it). Hence we do not want those struct page to be manage
> > like regular memory. That is why we extend ZONE_DEVICE to support different
> > types of memory.
> >
> > A persistent memory type is define for existing user of ZONE_DEVICE and a
> > new device un-addressable type is added for the un-addressable memory type.
> > There is a clear separation between what is expected from each memory type
> > and existing user of ZONE_DEVICE are un-affected by new requirement and new
> > use of the un-addressable type. All specific code path are protect with
> > test against the memory type.
> >
> > Because memory is un-addressable we use a new special swap type for when
> > a page is migrated to device memory (this reduces the number of maximum
> > swap file).
> >
> > The main two additions beside memory type to ZONE_DEVICE is two callbacks.
> > First one, page_free() is call whenever page refcount reach 1 (which means
> > the page is free as ZONE_DEVICE page never reach a refcount of 0). This
> > allow device driver to manage its memory and associated struct page.
> >
> > The second callback page_fault() happens when there is a CPU access to
> > an address that is back by a device page (which are un-addressable by the
> > CPU). This callback is responsible to migrate the page back to system
> > main memory. Device driver can not block migration back to system memory,
> > HMM make sure that such page can not be pin into device memory.
> >
> > If device is in some error condition and can not migrate memory back then
> > a CPU page fault to device memory should end with SIGBUS.
> >
> > Changed since v4:
> > - s/DEVICE_PUBLIC/DEVICE_HOST (to free DEVICE_PUBLIC for HMM-CDM)
> > Changed since v3:
> > - fix comments that was still using UNADDRESSABLE as keyword
> > - kernel configuration simplification
> > Changed since v2:
> > - s/DEVICE_UNADDRESSABLE/DEVICE_PRIVATE
> > Changed since v1:
> > - rename to device private memory (from device unaddressable)
> >
> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > Acked-by: Dan Williams <dan.j.williams@intel.com>
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> [..]
> > fs/proc/task_mmu.c | 7 +++++
> > include/linux/ioport.h | 1 +
> > include/linux/memremap.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/mm.h | 12 ++++++++
> > include/linux/swap.h | 24 ++++++++++++++--
> > include/linux/swapops.h | 68 ++++++++++++++++++++++++++++++++++++++++++++
> > kernel/memremap.c | 34 ++++++++++++++++++++++
> > mm/Kconfig | 11 +++++++-
> > mm/memory.c | 61 ++++++++++++++++++++++++++++++++++++++++
> > mm/memory_hotplug.c | 10 +++++--
> > mm/mprotect.c | 14 ++++++++++
> > 11 files changed, 309 insertions(+), 6 deletions(-)
> >
> [..]
> > diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> > index 93416196ba64..8e164ec9eed0 100644
> > --- a/include/linux/memremap.h
> > +++ b/include/linux/memremap.h
> > @@ -4,6 +4,8 @@
> > #include <linux/ioport.h>
> > #include <linux/percpu-refcount.h>
> >
> > +#include <asm/pgtable.h>
> > +
>
> So it turns out, over a year later, that this include was a mistake
> and makes the build fragile.
>
> > struct resource;
> > struct device;
> >
> [..]
> > +typedef int (*dev_page_fault_t)(struct vm_area_struct *vma,
> > + unsigned long addr,
> > + const struct page *page,
> > + unsigned int flags,
> > + pmd_t *pmdp);
>
> I recently included this file somewhere that did not have a pile of
> other mm headers included and 0day reports:
>
> In file included from arch/m68k/include/asm/pgtable_mm.h:148:0,
> from arch/m68k/include/asm/pgtable.h:5,
> from include/linux/memremap.h:7,
> from drivers//dax/bus.c:3:
> arch/m68k/include/asm/motorola_pgtable.h: In function 'pgd_offset':
> >> arch/m68k/include/asm/motorola_pgtable.h:199:11: error: dereferencing pointer to incomplete type 'const struct mm_struct'
> return mm->pgd + pgd_index(address);
> ^~
> I assume this pulls in the entirety of pgtable.h just to get the pmd_t
> definition?
>
> > +typedef void (*dev_page_free_t)(struct page *page, void *data);
> > +
> > /**
> > * struct dev_pagemap - metadata for ZONE_DEVICE mappings
> > + * @page_fault: callback when CPU fault on an unaddressable device page
> > + * @page_free: free page callback when page refcount reaches 1
> > * @altmap: pre-allocated/reserved memory for vmemmap allocations
> > * @res: physical address range covered by @ref
> > * @ref: reference count that pins the devm_memremap_pages() mapping
> > * @dev: host device of the mapping for debug
> > + * @data: private data pointer for page_free()
> > + * @type: memory type: see MEMORY_* in memory_hotplug.h
> > */
> > struct dev_pagemap {
> > + dev_page_fault_t page_fault;
>
> Rather than try to figure out how to forward declare pmd_t, how about
> just move dev_page_fault_t out of the generic dev_pagemap and into the
> HMM specific container structure? This should be straightfoward on top
> of the recent refactor.
Fine with me.
Cheers,
Jérôme
next prev parent reply other threads:[~2018-12-20 16:15 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-17 0:05 [HMM-v25 00/19] HMM (Heterogeneous Memory Management) v25 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 01/19] hmm: heterogeneous memory management documentation v3 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 02/19] mm/hmm: heterogeneous memory management (HMM for short) v5 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 03/19] mm/hmm/mirror: mirror process address space on device with HMM helpers v3 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 04/19] mm/hmm/mirror: helper to snapshot CPU page table v4 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 05/19] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 06/19] mm/memory_hotplug: introduce add_pages Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 07/19] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v5 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2018-12-20 8:33 ` Dan Williams
2018-12-20 16:15 ` Jerome Glisse [this message]
2018-12-20 16:15 ` Jerome Glisse
2018-12-20 16:47 ` Dan Williams
2018-12-20 16:57 ` Jerome Glisse
2018-12-20 16:57 ` Jerome Glisse
2017-08-17 0:05 ` [HMM-v25 08/19] mm/ZONE_DEVICE: special case put_page() for device private pages v4 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 09/19] mm/memcontrol: allow to uncharge page without using page->lru field Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 10/19] mm/memcontrol: support MEMORY_DEVICE_PRIVATE v4 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
[not found] ` <20170817000548.32038-11-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-09-05 17:13 ` Laurent Dufour
2017-09-05 17:13 ` Laurent Dufour
2017-09-05 17:13 ` Laurent Dufour
2017-09-05 17:21 ` Jerome Glisse
2017-09-05 17:21 ` Jerome Glisse
2017-08-17 0:05 ` [HMM-v25 11/19] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE v7 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 12/19] mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory v3 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 13/19] mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 21:12 ` Andrew Morton
2017-08-17 21:12 ` Andrew Morton
2017-08-17 21:44 ` Jerome Glisse
2017-08-17 21:44 ` Jerome Glisse
2017-08-17 0:05 ` [HMM-v25 14/19] mm/migrate: new memory migration helper for use with device memory v5 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 15/19] mm/migrate: migrate_vma() unmap page from vma while collecting pages Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 16/19] mm/migrate: support un-addressable ZONE_DEVICE page in migration v3 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 17/19] mm/migrate: allow migrate_vma() to alloc new page on empty entry v4 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 18/19] mm/device-public-memory: device memory cache coherent with CPU v5 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-08-17 0:05 ` [HMM-v25 19/19] mm/hmm: add new helper to hotplug CDM memory region v3 Jérôme Glisse
2017-08-17 0:05 ` Jérôme Glisse
2017-09-04 3:09 ` Bob Liu
2017-09-04 3:09 ` Bob Liu
2017-09-04 15:51 ` Jerome Glisse
2017-09-04 15:51 ` Jerome Glisse
2017-09-05 1:13 ` Bob Liu
2017-09-05 1:13 ` Bob Liu
2017-09-05 2:38 ` Jerome Glisse
2017-09-05 2:38 ` Jerome Glisse
2017-09-05 3:50 ` Bob Liu
2017-09-05 3:50 ` Bob Liu
2017-09-05 13:50 ` Jerome Glisse
2017-09-05 13:50 ` Jerome Glisse
2017-09-05 16:18 ` Dan Williams
2017-09-05 16:18 ` Dan Williams
2017-09-05 19:00 ` Ross Zwisler
2017-09-05 19:00 ` Ross Zwisler
2017-09-05 19:20 ` Jerome Glisse
2017-09-05 19:20 ` Jerome Glisse
2017-09-08 19:43 ` Ross Zwisler
2017-09-08 19:43 ` Ross Zwisler
2017-09-08 20:29 ` Jerome Glisse
2017-09-08 20:29 ` Jerome Glisse
2017-09-05 18:54 ` Ross Zwisler
2017-09-05 18:54 ` Ross Zwisler
2017-09-06 1:25 ` Bob Liu
2017-09-06 1:25 ` Bob Liu
2017-09-06 2:12 ` Jerome Glisse
2017-09-06 2:12 ` Jerome Glisse
2017-09-07 2:06 ` Bob Liu
2017-09-07 2:06 ` Bob Liu
2017-09-07 17:00 ` Jerome Glisse
2017-09-07 17:00 ` Jerome Glisse
2017-09-07 17:27 ` Jerome Glisse
2017-09-07 17:27 ` Jerome Glisse
2017-09-08 1:59 ` Bob Liu
2017-09-08 1:59 ` Bob Liu
2017-09-08 20:43 ` Dan Williams
2017-09-08 20:43 ` Dan Williams
2017-11-17 3:47 ` chetan L
2017-11-17 3:47 ` chetan L
2017-09-05 3:36 ` Balbir Singh
2017-09-05 3:36 ` Balbir Singh
2017-08-17 21:39 ` [HMM-v25 00/19] HMM (Heterogeneous Memory Management) v25 Andrew Morton
2017-08-17 21:39 ` Andrew Morton
2017-08-17 21:55 ` Jerome Glisse
2017-08-17 21:55 ` Jerome Glisse
2017-08-17 21:59 ` Dan Williams
2017-08-17 21:59 ` Dan Williams
2017-08-17 22:02 ` Jerome Glisse
2017-08-17 22:02 ` Jerome Glisse
2017-08-17 22:06 ` Dan Williams
2017-08-17 22:06 ` Dan Williams
2017-08-17 22:16 ` Andrew Morton
2017-08-17 22:16 ` Andrew Morton
2017-12-13 12:10 ` Figo.zhang
2017-12-13 16:12 ` Jerome Glisse
2017-12-14 2:48 ` Figo.zhang
2017-12-14 3:16 ` Jerome Glisse
2017-12-14 3:53 ` Figo.zhang
2017-12-14 4:16 ` Jerome Glisse
2017-12-14 7:05 ` Figo.zhang
2017-12-14 15:28 ` Jerome Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181220161538.GA3963@redhat.com \
--to=jglisse@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dnellans@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ross.zwisler@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.