From: Matthew Wilcox <matthew@wil.cx>
To: Mel Gorman <mgorman@suse.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
Chris Mason <clm@fb.com>, Dave Chinner <david@fromorbit.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"rwheeler@redhat.com" <rwheeler@redhat.com>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
Date: Wed, 29 Jan 2014 21:52:46 -0700 [thread overview]
Message-ID: <20140130045245.GH20939@parisc-linux.org> (raw)
In-Reply-To: <20140124105748.GQ4963@suse.de>
On Fri, Jan 24, 2014 at 10:57:48AM +0000, Mel Gorman wrote:
> So far on the table is
>
> 1. major filesystem overhawl
> 2. major vm overhawl
> 3. use compound pages as they are today and hope it does not go
> completely to hell, reboot when it does
Is the below paragraph an exposition of option 2, or is it an option 4,
change the VM unit of allocation? Other than the names you're using,
this is basically what I said to Kirill in an earlier thread; either
scrap the difference between PAGE_SIZE and PAGE_CACHE_SIZE, or start
making use of it.
The fact that EVERYBODY in this thread has been using PAGE_SIZE when they
should have been using PAGE_CACHE_SIZE makes me wonder if part of the
problem is that the split in naming went the wrong way. ie use PTE_SIZE
for 'the amount of memory pointed to by a pte_t' and use PAGE_SIZE for
'the amount of memory described by a struct page'.
(we need to remove the current users of PTE_SIZE; sparc32 and powerpc32,
but that's just a detail)
And we need to fix all the places that are currently getting the
distinction wrong. SMOP ... ;-) What would help is correct typing of
variables, possibly with sparse support to help us out. Big Job.
> That's why I suggested that it may be necessary to change the basic unit of
> allocation the kernel uses to be larger than the MMU page size and restrict
> how the sub pages are used. The requirement is to preserve the property that
> "with the exception of slab reclaim that any reclaim action will result
> in K-sized allocation succeeding" where K is the largest blocksize used by
> any underlying storage device. From an FS perspective then certain things
> would look similar to what they do today. Block data would be on physically
> contiguous pages, buffer_heads would still manage the case where block_size
> <= PAGEALLOC_PAGE_SIZE (as opposed to MMU_PAGE_SIZE), particularly for
> dirty tracking and so on. The VM perspective is different because now it
> has to handle MMU_PAGE_SIZE in a very different way, page reclaim of a page
> becomes multiple unmap events and so on. There would also be anomalies such
> as mlock of a range smaller than PAGEALLOC_PAGE_SIZE becomes difficult if
> not impossible to sensibly manage because mlock of a 4K page effectively
> pins the rest and it's not obvious how we would deal with the VMAs in that
> case. It would get more than just the storage gains though. Some of the
> scalability problems that deal with massive amount of struct pages may
> magically go away if the base unit of allocation and management changes.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-01-30 4:52 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-20 9:30 LSF/MM 2014 Call For Proposals Mel Gorman
2014-01-06 22:20 ` [LSF/MM TOPIC] [ATTEND] persistent memory progress, management of storage & file systems Ric Wheeler
2014-01-06 22:32 ` faibish, sorin
2014-01-07 19:44 ` Joel Becker
2014-01-21 7:00 ` LSF/MM 2014 Call For Proposals Michel Lespinasse
2014-01-22 3:04 ` [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes Ric Wheeler
2014-01-22 5:20 ` Joel Becker
2014-01-22 7:14 ` Hannes Reinecke
2014-01-22 9:34 ` [Lsf-pc] " Mel Gorman
2014-01-22 14:10 ` Ric Wheeler
2014-01-22 14:34 ` Mel Gorman
2014-01-22 14:58 ` Ric Wheeler
2014-01-22 15:19 ` Mel Gorman
2014-01-22 17:02 ` Chris Mason
2014-01-22 17:21 ` James Bottomley
2014-01-22 18:02 ` Chris Mason
2014-01-22 18:13 ` James Bottomley
2014-01-22 18:17 ` Ric Wheeler
2014-01-22 18:35 ` James Bottomley
2014-01-22 18:39 ` Ric Wheeler
2014-01-22 19:30 ` James Bottomley
2014-01-22 19:50 ` Andrew Morton
2014-01-22 20:13 ` Chris Mason
2014-01-23 2:46 ` David Lang
2014-01-23 5:21 ` Theodore Ts'o
2014-01-23 8:35 ` Dave Chinner
2014-01-23 12:55 ` Theodore Ts'o
2014-01-23 19:49 ` Dave Chinner
2014-01-23 21:21 ` Joel Becker
2014-01-22 20:57 ` Martin K. Petersen
2014-01-22 18:37 ` Chris Mason
2014-01-22 18:40 ` Ric Wheeler
2014-01-22 18:47 ` James Bottomley
2014-01-23 21:27 ` Joel Becker
2014-01-23 21:34 ` Chris Mason
2014-01-23 8:27 ` Dave Chinner
2014-01-23 15:47 ` James Bottomley
2014-01-23 16:44 ` Mel Gorman
2014-01-23 19:55 ` James Bottomley
2014-01-24 10:57 ` Mel Gorman
2014-01-30 4:52 ` Matthew Wilcox [this message]
2014-01-30 6:01 ` Dave Chinner
2014-01-30 10:50 ` Mel Gorman
2014-01-23 20:34 ` Dave Chinner
2014-01-23 20:54 ` Christoph Lameter
2014-01-23 8:24 ` Dave Chinner
2014-01-23 20:48 ` Christoph Lameter
2014-01-22 20:47 ` Martin K. Petersen
2014-01-23 8:21 ` Dave Chinner
2014-01-22 15:14 ` Chris Mason
2014-01-22 16:03 ` James Bottomley
2014-01-22 16:45 ` Ric Wheeler
2014-01-22 17:00 ` James Bottomley
2014-01-22 21:05 ` Jan Kara
2014-01-23 20:47 ` Christoph Lameter
2014-01-24 11:09 ` Mel Gorman
2014-01-24 15:44 ` Christoph Lameter
2014-01-22 15:54 ` James Bottomley
2014-03-14 9:02 ` Update on LSF/MM [was Re: LSF/MM 2014 Call For Proposals] James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140130045245.GH20939@parisc-linux.org \
--to=matthew@wil.cx \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=clm@fb.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mgorman@suse.de \
--cc=rwheeler@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).