All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>, Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, Theodore Ts'o <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Julia Lawall <Julia.Lawall@lip6.fr>, Tejun Heo <tj@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Thu, 07 May 2015 10:56:24 -0700	[thread overview]
Message-ID: <554BA748.9030804@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4hHwW4U8x1_VLGj2Q4a3HxWgK4F4n9qXg8009_n7sxkmg@mail.gmail.com>

On 05/07/2015 10:42 AM, Dan Williams wrote:
> On Thu, May 7, 2015 at 10:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> * Dan Williams <dan.j.williams@intel.com> wrote:
>> So is there anything fundamentally wrong about creating struct page
>> backing at mmap() time (and making sure aliased mmaps share struct
>> page arrays)?
> 
> Something like "get_user_pages() triggers memory hotplug for
> persistent memory", so they are actual real struct pages?  Can we do
> memory hotplug at that granularity?

We've traditionally limited them to SECTION_SIZE granularity, which is
128MB IIRC.  There are also assumptions in places that you can do page++
within a MAX_ORDER block if !CONFIG_HOLES_IN_ZONE.

But, in all practicality, a lot of those places are in code like the
buddy allocator.  If your PTEs all have _PAGE_SPECIAL set and we're not
ever expecting these fake 'struct page's to hit these code paths, it
probably doesn't matter.

You can probably get away with just allocating PAGE_SIZE worth of
'struct page' (which is 64) and mapping it in to vmemmap[].  The worst
case is that you'll eat 1 page of space for each outstanding page of
I/O.  That's a lot better than 2MB of temporary 'struct page' space per
page of I/O that it would take with a traditional hotplug operation.

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave.hansen@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>, Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, Theodore Ts'o <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Julia Lawall <Julia.Lawall@lip6.f
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Thu, 07 May 2015 10:56:24 -0700	[thread overview]
Message-ID: <554BA748.9030804@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4hHwW4U8x1_VLGj2Q4a3HxWgK4F4n9qXg8009_n7sxkmg@mail.gmail.com>

On 05/07/2015 10:42 AM, Dan Williams wrote:
> On Thu, May 7, 2015 at 10:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> * Dan Williams <dan.j.williams@intel.com> wrote:
>> So is there anything fundamentally wrong about creating struct page
>> backing at mmap() time (and making sure aliased mmaps share struct
>> page arrays)?
> 
> Something like "get_user_pages() triggers memory hotplug for
> persistent memory", so they are actual real struct pages?  Can we do
> memory hotplug at that granularity?

We've traditionally limited them to SECTION_SIZE granularity, which is
128MB IIRC.  There are also assumptions in places that you can do page++
within a MAX_ORDER block if !CONFIG_HOLES_IN_ZONE.

But, in all practicality, a lot of those places are in code like the
buddy allocator.  If your PTEs all have _PAGE_SPECIAL set and we're not
ever expecting these fake 'struct page's to hit these code paths, it
probably doesn't matter.

You can probably get away with just allocating PAGE_SIZE worth of
'struct page' (which is 64) and mapping it in to vmemmap[].  The worst
case is that you'll eat 1 page of space for each outstanding page of
I/O.  That's a lot better than 2MB of temporary 'struct page' space per
page of I/O that it would take with a traditional hotplug operation.

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave.hansen@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>, Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, "Theodore Ts'o" <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Julia Lawall <Julia.Lawall@lip6.fr>, Tejun Heo <tj@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Thu, 07 May 2015 10:56:24 -0700	[thread overview]
Message-ID: <554BA748.9030804@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4hHwW4U8x1_VLGj2Q4a3HxWgK4F4n9qXg8009_n7sxkmg@mail.gmail.com>

On 05/07/2015 10:42 AM, Dan Williams wrote:
> On Thu, May 7, 2015 at 10:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> * Dan Williams <dan.j.williams@intel.com> wrote:
>> So is there anything fundamentally wrong about creating struct page
>> backing at mmap() time (and making sure aliased mmaps share struct
>> page arrays)?
> 
> Something like "get_user_pages() triggers memory hotplug for
> persistent memory", so they are actual real struct pages?  Can we do
> memory hotplug at that granularity?

We've traditionally limited them to SECTION_SIZE granularity, which is
128MB IIRC.  There are also assumptions in places that you can do page++
within a MAX_ORDER block if !CONFIG_HOLES_IN_ZONE.

But, in all practicality, a lot of those places are in code like the
buddy allocator.  If your PTEs all have _PAGE_SPECIAL set and we're not
ever expecting these fake 'struct page's to hit these code paths, it
probably doesn't matter.

You can probably get away with just allocating PAGE_SIZE worth of
'struct page' (which is 64) and mapping it in to vmemmap[].  The worst
case is that you'll eat 1 page of space for each outstanding page of
I/O.  That's a lot better than 2MB of temporary 'struct page' space per
page of I/O that it would take with a traditional hotplug operation.

  reply	other threads:[~2015-05-07 17:56 UTC|newest]

Thread overview: 180+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-06 20:04 [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Dan Williams
2015-05-06 20:04 ` Dan Williams
2015-05-06 20:04 ` [PATCH v2 01/10] arch: introduce __pfn_t for persistent memory i/o Dan Williams
2015-05-06 20:04   ` Dan Williams
2015-05-07 14:55   ` Stephen Rothwell
2015-05-07 14:55     ` Stephen Rothwell
2015-05-08  0:21     ` Dan Williams
2015-05-08  0:21       ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 02/10] block: add helpers for accessing a bio_vec page Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-08 15:59   ` Dan Williams
2015-05-08 15:59     ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 03/10] block: convert .bv_page to .bv_pfn bio_vec Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 04/10] dma-mapping: allow archs to optionally specify a ->map_pfn() operation Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 05/10] scatterlist: use sg_phys() Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 06/10] scatterlist: support "page-less" (__pfn_t only) entries Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 07/10] x86: support dma_map_pfn() Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 08/10] x86: support kmap_atomic_pfn_t() for persistent memory Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:20   ` [Linux-nvdimm] " Dan Williams
2015-05-06 20:20     ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 09/10] dax: convert to __pfn_t Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 10/10] block: base support for pfn i/o Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:50 ` [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Al Viro
2015-05-06 20:50   ` Al Viro
2015-05-06 22:10 ` Linus Torvalds
2015-05-06 22:10   ` Linus Torvalds
2015-05-06 22:10   ` Linus Torvalds
2015-05-06 23:47   ` Dan Williams
2015-05-06 23:47     ` Dan Williams
2015-05-06 23:47     ` Dan Williams
2015-05-07  0:19     ` Linus Torvalds
2015-05-07  0:19       ` Linus Torvalds
2015-05-07  0:19       ` Linus Torvalds
2015-05-07  2:36       ` Dan Williams
2015-05-07  2:36         ` Dan Williams
2015-05-07  2:36         ` Dan Williams
2015-05-07  9:02         ` Ingo Molnar
2015-05-07  9:02           ` Ingo Molnar
2015-05-07  9:02           ` Ingo Molnar
2015-05-07 14:42           ` Ingo Molnar
2015-05-07 14:42             ` Ingo Molnar
2015-05-07 14:42             ` Ingo Molnar
2015-05-07 15:52             ` Dan Williams
2015-05-07 15:52               ` Dan Williams
2015-05-07 15:52               ` Dan Williams
2015-05-07 17:52               ` Ingo Molnar
2015-05-07 17:52                 ` Ingo Molnar
2015-05-07 17:52                 ` Ingo Molnar
2015-05-07 15:00         ` Linus Torvalds
2015-05-07 15:00           ` Linus Torvalds
2015-05-07 15:00           ` Linus Torvalds
2015-05-07 15:40           ` Dan Williams
2015-05-07 15:40             ` Dan Williams
2015-05-07 15:40             ` Dan Williams
2015-05-07 15:58             ` Linus Torvalds
2015-05-07 15:58               ` Linus Torvalds
2015-05-07 15:58               ` Linus Torvalds
2015-05-07 16:03               ` Dan Williams
2015-05-07 16:03                 ` Dan Williams
2015-05-07 16:03                 ` Dan Williams
2015-05-07 17:36                 ` Ingo Molnar
2015-05-07 17:36                   ` Ingo Molnar
2015-05-07 17:36                   ` Ingo Molnar
2015-05-07 17:42                   ` Dan Williams
2015-05-07 17:42                     ` Dan Williams
2015-05-07 17:42                     ` Dan Williams
2015-05-07 17:56                     ` Dave Hansen [this message]
2015-05-07 17:56                       ` Dave Hansen
2015-05-07 17:56                       ` Dave Hansen
2015-05-07 19:11                       ` Ingo Molnar
2015-05-07 19:11                         ` Ingo Molnar
2015-05-07 19:11                         ` Ingo Molnar
2015-05-07 19:36                         ` Jerome Glisse
2015-05-07 19:36                           ` Jerome Glisse
2015-05-07 19:36                           ` Jerome Glisse
2015-05-07 19:48                           ` Ingo Molnar
2015-05-07 19:48                             ` Ingo Molnar
2015-05-07 19:48                             ` Ingo Molnar
2015-05-07 19:53                             ` Ingo Molnar
2015-05-07 19:53                               ` Ingo Molnar
2015-05-07 19:53                               ` Ingo Molnar
2015-05-07 20:18                               ` Jerome Glisse
2015-05-07 20:18                                 ` Jerome Glisse
2015-05-07 20:18                                 ` Jerome Glisse
2015-05-08  5:37                                 ` Ingo Molnar
2015-05-08  5:37                                   ` Ingo Molnar
2015-05-08  5:37                                   ` Ingo Molnar
2015-05-08  9:20                                   ` Al Viro
2015-05-08  9:20                                     ` Al Viro
2015-05-08  9:26                                     ` Ingo Molnar
2015-05-08  9:26                                       ` Ingo Molnar
2015-05-08 10:00                                       ` Al Viro
2015-05-08 10:00                                         ` Al Viro
2015-05-08 13:45                         ` Rik van Riel
2015-05-08 13:45                           ` Rik van Riel
2015-05-08 14:05                           ` Ingo Molnar
2015-05-08 14:05                             ` Ingo Molnar
2015-05-08 14:40                             ` John Stoffel
2015-05-08 14:40                               ` John Stoffel
2015-05-08 15:54                               ` Linus Torvalds
2015-05-08 15:54                                 ` Linus Torvalds
2015-05-08 16:28                                 ` Al Viro
2015-05-08 16:28                                   ` Al Viro
2015-05-08 16:59                                 ` Rik van Riel
2015-05-08 16:59                                   ` Rik van Riel
2015-05-09  1:14                                   ` Linus Torvalds
2015-05-09  1:14                                     ` Linus Torvalds
2015-05-09  3:02                                     ` Rik van Riel
2015-05-09  3:02                                       ` Rik van Riel
2015-05-09  3:52                                       ` Linus Torvalds
2015-05-09  3:52                                         ` Linus Torvalds
2015-05-09 21:56                                       ` Dave Chinner
2015-05-09 21:56                                         ` Dave Chinner
2015-05-09  8:45                                   ` "Directly mapped persistent memory page cache" Ingo Molnar
2015-05-09  8:45                                     ` Ingo Molnar
2015-05-09 15:51                                     ` Eric W. Biederman
2015-05-09 15:51                                       ` Eric W. Biederman
2015-05-10 10:07                                       ` Ingo Molnar
2015-05-10 10:07                                         ` Ingo Molnar
2015-05-09 18:24                                     ` Dan Williams
2015-05-09 18:24                                       ` Dan Williams
2015-05-10  9:46                                       ` Ingo Molnar
2015-05-10  9:46                                         ` Ingo Molnar
2015-05-10 17:29                                         ` Dan Williams
2015-05-10 17:29                                           ` Dan Williams
2015-05-11  8:25                                     ` Dave Chinner
2015-05-11  8:25                                       ` Dave Chinner
2015-05-11  9:18                                       ` Ingo Molnar
2015-05-11  9:18                                         ` Ingo Molnar
2015-05-11 10:12                                         ` Zuckerman, Boris
2015-05-11 10:12                                           ` Zuckerman, Boris
2015-05-11 10:38                                           ` Ingo Molnar
2015-05-11 10:38                                             ` Ingo Molnar
2015-05-11 14:51                                             ` Jeff Moyer
2015-05-11 14:51                                               ` Jeff Moyer
2015-05-12  0:53                                         ` Dave Chinner
2015-05-12  0:53                                           ` Dave Chinner
2015-05-12 14:47                                           ` Jerome Glisse
2015-05-12 14:47                                             ` Jerome Glisse
2015-05-12 14:47                                             ` Jerome Glisse
2015-06-05  5:43                                             ` Dan Williams
2015-06-05  5:43                                               ` Dan Williams
2015-05-11 14:31                                     ` Matthew Wilcox
2015-05-11 14:31                                       ` Matthew Wilcox
2015-05-11 20:01                                       ` Jerome Glisse
2015-05-11 20:01                                         ` Jerome Glisse
2015-05-11 20:01                                         ` Jerome Glisse
2015-05-08 20:40                                 ` [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t John Stoffel
2015-05-08 20:40                                   ` John Stoffel
2015-05-08 14:54                             ` Rik van Riel
2015-05-08 14:54                               ` Rik van Riel
2015-05-07 17:43                 ` Linus Torvalds
2015-05-07 17:43                   ` Linus Torvalds
2015-05-07 17:43                   ` Linus Torvalds
2015-05-07 20:06                   ` Dan Williams
2015-05-07 20:06                     ` Dan Williams
2015-05-07 20:06                     ` Dan Williams
2015-05-07 16:18       ` Christoph Hellwig
2015-05-07 16:18         ` Christoph Hellwig
2015-05-07 16:18         ` Christoph Hellwig
2015-05-07 16:41         ` Dan Williams
2015-05-07 16:41           ` Dan Williams
2015-05-07 16:41           ` Dan Williams
2015-05-07 18:40           ` Ingo Molnar
2015-05-07 18:40             ` Ingo Molnar
2015-05-07 18:40             ` Ingo Molnar
2015-05-07 19:44             ` Dan Williams
2015-05-07 19:44               ` Dan Williams
2015-05-07 19:44               ` Dan Williams
2015-05-07 17:30         ` Jerome Glisse
2015-05-07 17:30           ` Jerome Glisse
2015-05-07 17:30           ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=554BA748.9030804@linux.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=Julia.Lawall@lip6.fr \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=boaz@plexistor.com \
    --cc=clm@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=martin.petersen@oracle.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=neilb@suse.de \
    --cc=paulus@samba.org \
    --cc=riel@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=snitzer@redhat.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.