All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Rik van Riel <riel@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	david <david@fromorbit.com>, Ingo Molnar <mingo@kernel.org>,
	Linux MM <linux-mm@kvack.org>, Ingo Molnar <mingo@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory"
Date: Mon, 17 Aug 2015 17:45:56 -0400	[thread overview]
Message-ID: <20150817214554.GA5976@gmail.com> (raw)
In-Reply-To: <CAPcyv4i-5RWTLK8FQFCBuFKwY0_HShbW7PVTHudSk4sF35xosA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2340 bytes --]

On Fri, Aug 14, 2015 at 07:11:27PM -0700, Dan Williams wrote:
> On Fri, Aug 14, 2015 at 3:33 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > On Fri, Aug 14, 2015 at 3:06 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >> On Fri, Aug 14, 2015 at 02:52:15PM -0700, Dan Williams wrote:
> >>> On Fri, Aug 14, 2015 at 2:37 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >>> > On Wed, Aug 12, 2015 at 11:50:05PM -0400, Dan Williams wrote:
> > [..]
> >>> > What is the rational for not updating max_pfn, max_low_pfn, ... ?
> >>> >
> >>>
> >>> The idea is that this memory is not meant to be available to the page
> >>> allocator and should not count as new memory capacity.  We're only
> >>> hotplugging it to get struct page coverage.
> >>
> >> But this sounds bogus to me to rely on max_pfn to stay smaller than
> >> first_dev_pfn.  For instance you might plug a device that register
> >> dev memory and then some regular memory might be hotplug, effectively
> >> updating max_pfn to a value bigger than first_dev_pfn.
> >>
> >
> > True.
> >
> >> Also i do not think that the buddy allocator use max_pfn or max_low_pfn
> >> to consider page/zone for allocation or not.
> >
> > Yes, I took it out with no effects.  I'll investigate further whether
> > we should be touching those variables or not for this new usage.
> 
> Although it does not offer perfect protection if device memory is at a
> physically lower address than RAM, skipping the update of these
> variables does seem to be what we want.  For example /dev/mem would
> fail to allow write access to persistent memory if it fails a
> valid_phys_addr_range() check.  Since /dev/mem does not know how to
> write to PMEM in a reliably persistent way, it should not treat a
> PMEM-pfn like RAM.

So i attach is a patch that should keep ZONE_DEVICE out of consideration
for the buddy allocator. You might also want to keep page reserved and not
free inside the zone, you could replace the generic_online_page() using
set_online_page_callback() while hotpluging device memory.

Regarding /dev/mem i would not worry about highmem, as /dev/mem is already
broken in respect to memory hole that might exist (at least that is my
understanding). Alternatively if you really care about /dev/mem you could
add an arch valid_phys_addr_range() that could check valid zone.

Cheers,
J�r�me

[-- Attachment #2: 0001-mm-ZONE_DEVICE-Keep-ZONE_DEVICE-out-of-allocation-zo.patch --]
[-- Type: text/plain, Size: 1260 bytes --]

>From 45976e1186eee45ecb277fe5293a7cfa7466d740 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Mon, 17 Aug 2015 17:31:27 -0400
Subject: [PATCH] mm/ZONE_DEVICE: Keep ZONE_DEVICE out of allocation zonelist.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Memory inside a ZONE_DEVICE should never be consider by the buddy
allocator and thus any such zone should never be added to any of
the zonelist. This patch just do that.

Signed-off-by: J�r�me Glisse <jglisse@redhat.com>
---
 mm/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef19f22..f3e26de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3834,6 +3834,13 @@ static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist,
 	do {
 		zone_type--;
 		zone = pgdat->node_zones + zone_type;
+		/*
+		 * Device zone is special memory and should never be consider
+		 * for regular allocation. It is expected that page in device
+		 * zone will be allocated by other means.
+		 */
+		if (is_dev_zone(zone))
+			continue;
 		if (populated_zone(zone)) {
 			zoneref_set_zone(zone,
 				&zonelist->_zonerefs[nr_zones++]);
-- 
1.8.3.1


WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <j.glisse@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Rik van Riel <riel@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	david <david@fromorbit.com>, Ingo Molnar <mingo@kernel.org>,
	Linux MM <linux-mm@kvack.org>, Ingo Molnar <mingo@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory"
Date: Mon, 17 Aug 2015 17:45:56 -0400	[thread overview]
Message-ID: <20150817214554.GA5976@gmail.com> (raw)
In-Reply-To: <CAPcyv4i-5RWTLK8FQFCBuFKwY0_HShbW7PVTHudSk4sF35xosA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2336 bytes --]

On Fri, Aug 14, 2015 at 07:11:27PM -0700, Dan Williams wrote:
> On Fri, Aug 14, 2015 at 3:33 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > On Fri, Aug 14, 2015 at 3:06 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >> On Fri, Aug 14, 2015 at 02:52:15PM -0700, Dan Williams wrote:
> >>> On Fri, Aug 14, 2015 at 2:37 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >>> > On Wed, Aug 12, 2015 at 11:50:05PM -0400, Dan Williams wrote:
> > [..]
> >>> > What is the rational for not updating max_pfn, max_low_pfn, ... ?
> >>> >
> >>>
> >>> The idea is that this memory is not meant to be available to the page
> >>> allocator and should not count as new memory capacity.  We're only
> >>> hotplugging it to get struct page coverage.
> >>
> >> But this sounds bogus to me to rely on max_pfn to stay smaller than
> >> first_dev_pfn.  For instance you might plug a device that register
> >> dev memory and then some regular memory might be hotplug, effectively
> >> updating max_pfn to a value bigger than first_dev_pfn.
> >>
> >
> > True.
> >
> >> Also i do not think that the buddy allocator use max_pfn or max_low_pfn
> >> to consider page/zone for allocation or not.
> >
> > Yes, I took it out with no effects.  I'll investigate further whether
> > we should be touching those variables or not for this new usage.
> 
> Although it does not offer perfect protection if device memory is at a
> physically lower address than RAM, skipping the update of these
> variables does seem to be what we want.  For example /dev/mem would
> fail to allow write access to persistent memory if it fails a
> valid_phys_addr_range() check.  Since /dev/mem does not know how to
> write to PMEM in a reliably persistent way, it should not treat a
> PMEM-pfn like RAM.

So i attach is a patch that should keep ZONE_DEVICE out of consideration
for the buddy allocator. You might also want to keep page reserved and not
free inside the zone, you could replace the generic_online_page() using
set_online_page_callback() while hotpluging device memory.

Regarding /dev/mem i would not worry about highmem, as /dev/mem is already
broken in respect to memory hole that might exist (at least that is my
understanding). Alternatively if you really care about /dev/mem you could
add an arch valid_phys_addr_range() that could check valid zone.

Cheers,
Jerome

[-- Attachment #2: 0001-mm-ZONE_DEVICE-Keep-ZONE_DEVICE-out-of-allocation-zo.patch --]
[-- Type: text/plain, Size: 0 bytes --]



WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <j.glisse@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Rik van Riel <riel@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	david <david@fromorbit.com>, Ingo Molnar <mingo@kernel.org>,
	Linux MM <linux-mm@kvack.org>, Ingo Molnar <mingo@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory"
Date: Mon, 17 Aug 2015 17:45:56 -0400	[thread overview]
Message-ID: <20150817214554.GA5976@gmail.com> (raw)
In-Reply-To: <CAPcyv4i-5RWTLK8FQFCBuFKwY0_HShbW7PVTHudSk4sF35xosA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2336 bytes --]

On Fri, Aug 14, 2015 at 07:11:27PM -0700, Dan Williams wrote:
> On Fri, Aug 14, 2015 at 3:33 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > On Fri, Aug 14, 2015 at 3:06 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >> On Fri, Aug 14, 2015 at 02:52:15PM -0700, Dan Williams wrote:
> >>> On Fri, Aug 14, 2015 at 2:37 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >>> > On Wed, Aug 12, 2015 at 11:50:05PM -0400, Dan Williams wrote:
> > [..]
> >>> > What is the rational for not updating max_pfn, max_low_pfn, ... ?
> >>> >
> >>>
> >>> The idea is that this memory is not meant to be available to the page
> >>> allocator and should not count as new memory capacity.  We're only
> >>> hotplugging it to get struct page coverage.
> >>
> >> But this sounds bogus to me to rely on max_pfn to stay smaller than
> >> first_dev_pfn.  For instance you might plug a device that register
> >> dev memory and then some regular memory might be hotplug, effectively
> >> updating max_pfn to a value bigger than first_dev_pfn.
> >>
> >
> > True.
> >
> >> Also i do not think that the buddy allocator use max_pfn or max_low_pfn
> >> to consider page/zone for allocation or not.
> >
> > Yes, I took it out with no effects.  I'll investigate further whether
> > we should be touching those variables or not for this new usage.
> 
> Although it does not offer perfect protection if device memory is at a
> physically lower address than RAM, skipping the update of these
> variables does seem to be what we want.  For example /dev/mem would
> fail to allow write access to persistent memory if it fails a
> valid_phys_addr_range() check.  Since /dev/mem does not know how to
> write to PMEM in a reliably persistent way, it should not treat a
> PMEM-pfn like RAM.

So i attach is a patch that should keep ZONE_DEVICE out of consideration
for the buddy allocator. You might also want to keep page reserved and not
free inside the zone, you could replace the generic_online_page() using
set_online_page_callback() while hotpluging device memory.

Regarding /dev/mem i would not worry about highmem, as /dev/mem is already
broken in respect to memory hole that might exist (at least that is my
understanding). Alternatively if you really care about /dev/mem you could
add an arch valid_phys_addr_range() that could check valid zone.

Cheers,
Jérôme

[-- Attachment #2: 0001-mm-ZONE_DEVICE-Keep-ZONE_DEVICE-out-of-allocation-zo.patch --]
[-- Type: text/plain, Size: 1256 bytes --]

>From 45976e1186eee45ecb277fe5293a7cfa7466d740 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Mon, 17 Aug 2015 17:31:27 -0400
Subject: [PATCH] mm/ZONE_DEVICE: Keep ZONE_DEVICE out of allocation zonelist.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Memory inside a ZONE_DEVICE should never be consider by the buddy
allocator and thus any such zone should never be added to any of
the zonelist. This patch just do that.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 mm/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef19f22..f3e26de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3834,6 +3834,13 @@ static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist,
 	do {
 		zone_type--;
 		zone = pgdat->node_zones + zone_type;
+		/*
+		 * Device zone is special memory and should never be consider
+		 * for regular allocation. It is expected that page in device
+		 * zone will be allocated by other means.
+		 */
+		if (is_dev_zone(zone))
+			continue;
 		if (populated_zone(zone)) {
 			zoneref_set_zone(zone,
 				&zonelist->_zonerefs[nr_zones++]);
-- 
1.8.3.1


  reply	other threads:[~2015-08-17 21:45 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-13  3:50 [RFC PATCH 0/7] 'struct page' driver for persistent memory Dan Williams
2015-08-13  3:50 ` Dan Williams
2015-08-13  3:50 ` [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory" Dan Williams
2015-08-13  3:50   ` Dan Williams
2015-08-14 21:37   ` Jerome Glisse
2015-08-14 21:37     ` Jerome Glisse
2015-08-14 21:37     ` Jerome Glisse
2015-08-14 21:52     ` Dan Williams
2015-08-14 21:52       ` Dan Williams
2015-08-14 22:06       ` Jerome Glisse
2015-08-14 22:06         ` Jerome Glisse
2015-08-14 22:06         ` Jerome Glisse
2015-08-14 22:33         ` Dan Williams
2015-08-14 22:33           ` Dan Williams
2015-08-15  2:11           ` Dan Williams
2015-08-15  2:11             ` Dan Williams
2015-08-17 21:45             ` Jerome Glisse [this message]
2015-08-17 21:45               ` Jerome Glisse
2015-08-17 21:45               ` Jerome Glisse
2015-08-18  0:46               ` Dan Williams
2015-08-18  0:46                 ` Dan Williams
2015-08-18 16:55                 ` Jerome Glisse
2015-08-18 16:55                   ` Jerome Glisse
2015-08-18 16:55                   ` Jerome Glisse
2015-08-18 17:23                   ` Dan Williams
2015-08-18 17:23                     ` Dan Williams
2015-08-18 19:06                     ` Jerome Glisse
2015-08-18 19:06                       ` Jerome Glisse
2015-08-18 19:06                       ` Jerome Glisse
2015-08-20  0:49                       ` Dan Williams
2015-08-20  0:49                         ` Dan Williams
2015-08-15  8:59       ` Christoph Hellwig
2015-08-15  8:59         ` Christoph Hellwig
2015-08-21 15:02         ` Dan Williams
2015-08-21 15:02           ` Dan Williams
2015-08-21 15:15           ` Jerome Glisse
2015-08-21 15:15             ` Jerome Glisse
2015-08-21 15:15             ` Jerome Glisse
2015-08-15 13:33   ` Christoph Hellwig
2015-08-15 13:33     ` Christoph Hellwig
2015-08-13  3:50 ` [RFC PATCH 2/7] x86, mm: introduce struct vmem_altmap Dan Williams
2015-08-13  3:50   ` Dan Williams
2015-08-13  3:50 ` [RFC PATCH 3/7] x86, mm: arch_add_dev_memory() Dan Williams
2015-08-13  3:50   ` Dan Williams
2015-08-13  3:50 ` [RFC PATCH 4/7] mm: register_dev_memmap() Dan Williams
2015-08-13  3:50   ` Dan Williams
2015-08-15  9:04   ` Christoph Hellwig
2015-08-15  9:04     ` Christoph Hellwig
2015-08-13  3:50 ` [RFC PATCH 5/7] libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option Dan Williams
2015-08-13  3:50   ` Dan Williams
2015-08-15  9:06   ` Christoph Hellwig
2015-08-15  9:06     ` Christoph Hellwig
2015-08-15 15:28     ` Dan Williams
2015-08-15 15:28       ` Dan Williams
2015-08-15 15:58       ` Christoph Hellwig
2015-08-15 15:58         ` Christoph Hellwig
2015-08-15 16:04         ` Dan Williams
2015-08-15 16:04           ` Dan Williams
2015-08-17 15:01           ` Christoph Hellwig
2015-08-17 15:01             ` Christoph Hellwig
2015-08-17 15:32             ` Dan Williams
2015-08-17 15:32               ` Dan Williams
2015-08-13  3:50 ` [RFC PATCH 6/7] libnvdimm, pfn: 'struct page' provider infrastructure Dan Williams
2015-08-13  3:50   ` Dan Williams
2015-08-13  3:50 ` [RFC PATCH 7/7] libnvdimm, pmem: 'struct page' for pmem Dan Williams
2015-08-13  3:50   ` Dan Williams
2015-08-15  9:01 ` [RFC PATCH 0/7] 'struct page' driver for persistent memory Christoph Hellwig
2015-08-15  9:01   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150817214554.GA5976@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=boaz@plexistor.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=riel@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.