From: Dave Chinner <david@fromorbit.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Ric Wheeler <rwheeler@redhat.com>,
linux-scsi@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org,
linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
Date: Thu, 23 Jan 2014 19:21:21 +1100 [thread overview]
Message-ID: <20140123082121.GN13997@dastard> (raw)
In-Reply-To: <20140122143452.GW4963@suse.de>
On Wed, Jan 22, 2014 at 02:34:52PM +0000, Mel Gorman wrote:
> On Wed, Jan 22, 2014 at 09:10:48AM -0500, Ric Wheeler wrote:
> > On 01/22/2014 04:34 AM, Mel Gorman wrote:
> > >On Tue, Jan 21, 2014 at 10:04:29PM -0500, Ric Wheeler wrote:
> > >>One topic that has been lurking forever at the edges is the current
> > >>4k limitation for file system block sizes. Some devices in
> > >>production today and others coming soon have larger sectors and it
> > >>would be interesting to see if it is time to poke at this topic
> > >>again.
> > >>
> > >Large block support was proposed years ago by Christoph Lameter
> > >(http://lwn.net/Articles/232757/). I think I was just getting started
> > >in the community at the time so I do not recall any of the details. I do
> > >believe it motivated an alternative by Nick Piggin called fsblock though
> > >(http://lwn.net/Articles/321390/). At the very least it would be nice to
> > >know why neither were never merged for those of us that were not around
> > >at the time and who may not have the chance to dive through mailing list
> > >archives between now and March.
> > >
> > >FWIW, I would expect that a show-stopper for any proposal is requiring
> > >high-order allocations to succeed for the system to behave correctly.
> > >
> >
> > I have a somewhat hazy memory of Andrew warning us that touching
> > this code takes us into dark and scary places.
> >
>
> That is a light summary. As Andrew tends to reject patches with poor
> documentation in case we forget the details in 6 months, I'm going to guess
> that he does not remember the details of a discussion from 7ish years ago.
> This is where Andrew swoops in with a dazzling display of his eidetic
> memory just to prove me wrong.
>
> Ric, are there any storage vendor that is pushing for this right now?
> Is someone working on this right now or planning to? If they are, have they
> looked into the history of fsblock (Nick) and large block support (Christoph)
> to see if they are candidates for forward porting or reimplementation?
> I ask because without that person there is a risk that the discussion
> will go as follows
>
> Topic leader: Does anyone have an objection to supporting larger block
> sizes than the page size?
> Room: Send patches and we'll talk.
So, from someone who was done in the trenches of the large
filesystem block size code wars, the main objection to Christoph
lameter's patchset was that it used high order compound pages in the
page cache so that nothing at filesystem level needed to be changed
to support large block sizes.
The patch to enable XFS to use 64k block sizes with Christoph's
patches was simply removing 5 lines of code that limited the block
size to PAGE_SIZE. And everything just worked.
Given that compound pages are used all over the place now and we
also have page migration, compaction and other MM support that
greatly improves high order memory allocation, perhaps we should
revisit this approach.
As to Nick's fsblock rewrite, he basically rewrote all the
bufferhead head code to handle filesystem blocks larger than a page
whilst leaving the page cache untouched. i.e. the complete opposite
approach. The problem with this approach is that every filesystem
needs to be re-written to use fsblocks rather than bufferheads. For
some filesystems that isn't hard (e.g. ext2) but for filesystems
that use bufferheads in the core of their journalling subsystems
that's a completely different story.
And for filesystems like XFS, it doesn't solve any of the problem
with using bufferheads that we have now, so it simply introduces a
huge amount of IO path rework and validation without providing any
advantage from a feature or performance point of view. i.e. extent
based filesystems mostly negate the impact of filesystem block size
on IO performance...
Realistically, if I'm going to do something in XFS to add block size
> page size support, I'm going to do it wiht somethign XFS can track
through it's own journal so I can add data=journal functionality
with the same filesystem block/extent header structures used to
track the pages in blocks larger than PAGE_SIZE. And given that we
already have such infrastructure in XFS to support directory
blocks larger than filesystem block size....
FWIW, as to the original "large sector size" support question, XFS
already supports sector sizes up to 32k in size. The limitation is
actually a limitation of the journal format, so going larger than
that would take some work...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-01-23 8:21 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-20 9:30 LSF/MM 2014 Call For Proposals Mel Gorman
2014-01-06 22:20 ` [LSF/MM TOPIC] [ATTEND] persistent memory progress, management of storage & file systems Ric Wheeler
2014-01-06 22:32 ` faibish, sorin
2014-01-07 19:44 ` Joel Becker
2014-01-21 7:00 ` LSF/MM 2014 Call For Proposals Michel Lespinasse
2014-01-22 3:04 ` [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes Ric Wheeler
2014-01-22 5:20 ` Joel Becker
2014-01-22 7:14 ` Hannes Reinecke
2014-01-22 9:34 ` [Lsf-pc] " Mel Gorman
2014-01-22 14:10 ` Ric Wheeler
2014-01-22 14:34 ` Mel Gorman
2014-01-22 14:58 ` Ric Wheeler
2014-01-22 15:19 ` Mel Gorman
2014-01-22 17:02 ` Chris Mason
2014-01-22 17:21 ` James Bottomley
2014-01-22 18:02 ` Chris Mason
2014-01-22 18:13 ` James Bottomley
2014-01-22 18:17 ` Ric Wheeler
2014-01-22 18:35 ` James Bottomley
2014-01-22 18:39 ` Ric Wheeler
2014-01-22 19:30 ` James Bottomley
2014-01-22 19:50 ` Andrew Morton
2014-01-22 20:13 ` Chris Mason
2014-01-23 2:46 ` David Lang
2014-01-23 5:21 ` Theodore Ts'o
2014-01-23 8:35 ` Dave Chinner
2014-01-23 12:55 ` Theodore Ts'o
2014-01-23 19:49 ` Dave Chinner
2014-01-23 21:21 ` Joel Becker
2014-01-22 20:57 ` Martin K. Petersen
2014-01-22 18:37 ` Chris Mason
2014-01-22 18:40 ` Ric Wheeler
2014-01-22 18:47 ` James Bottomley
2014-01-23 21:27 ` Joel Becker
2014-01-23 21:34 ` Chris Mason
2014-01-23 8:27 ` Dave Chinner
2014-01-23 15:47 ` James Bottomley
2014-01-23 16:44 ` Mel Gorman
2014-01-23 19:55 ` James Bottomley
2014-01-24 10:57 ` Mel Gorman
2014-01-30 4:52 ` Matthew Wilcox
2014-01-30 6:01 ` Dave Chinner
2014-01-30 10:50 ` Mel Gorman
2014-01-23 20:34 ` Dave Chinner
2014-01-23 20:54 ` Christoph Lameter
2014-01-23 8:24 ` Dave Chinner
2014-01-23 20:48 ` Christoph Lameter
2014-01-22 20:47 ` Martin K. Petersen
2014-01-23 8:21 ` Dave Chinner [this message]
2014-01-22 15:14 ` Chris Mason
2014-01-22 16:03 ` James Bottomley
2014-01-22 16:45 ` Ric Wheeler
2014-01-22 17:00 ` James Bottomley
2014-01-22 21:05 ` Jan Kara
2014-01-23 20:47 ` Christoph Lameter
2014-01-24 11:09 ` Mel Gorman
2014-01-24 15:44 ` Christoph Lameter
2014-01-22 15:54 ` James Bottomley
2014-03-14 9:02 ` Update on LSF/MM [was Re: LSF/MM 2014 Call For Proposals] James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140123082121.GN13997@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mgorman@suse.de \
--cc=rwheeler@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).