public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: David Chinner <dgc@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	clameter@sgi.com, linux-kernel@vger.kernel.org,
	Mel Gorman <mel@skynet.ie>,
	William Lee Irwin III <wli@holomorphy.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	Badari Pulavarty <pbadari@gmail.com>,
	Maxim Levitsky <maximlevitsky@gmail.com>
Subject: Re: [00/17] Large Blocksize Support V3
Date: Fri, 04 May 2007 07:31:37 -0600	[thread overview]
Message-ID: <m1ejlwu346.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20070427080321.GG32602149@melbourne.sgi.com> (David Chinner's message of "Fri, 27 Apr 2007 18:03:21 +1000")

David Chinner <dgc@sgi.com> writes:

> On Fri, Apr 27, 2007 at 12:04:03AM -0700, Andrew Morton wrote:
>
> I've looked at all this but I'm trying to work out if anyone
> else has looked at the impact of doing this. I have direct experience
> with this form of block aggregation - this is pretty much what is
> done in irix - and it's full of nasty, ugly corner cases.
>
> I've got several year-old Irix bugs  assigned that are hit every so
> often where one page in the aggregated set has the wrong state, and
> it's simply not possible to either reproduce the problem or work out
> how it happened. The code has grown too complex and convoluted, and
> by the time the problem is noticed (either by hang, panic or bug
> check) the cause of it is long gone.
>
> I don't want to go back to having to deal with this sort of problem
> - I'd much prefer to have a design that does not make the same
> mistakes that lead to these sorts of problem.

So the practical question is.  Was it a high level design problem or
was it simply a choice of implementation issue.

Until we code review and implementation that does page aggregation for
linux we can't say how nasty it would be.

Of course what gets confusing is when you mention you refer to the
previous implementation as a buffer cache, because that isn't at all
what Linux had for a buffer cache.  The linux buffer cache was the
same as the current page cache except it was index by block number and
not by offset into a file.

>> 
>> You're addressing Christoph's straw man here.
>
> No, I'm speaking from years of experience working on a
> page/buffer/chunk cache capable of using both large pages and
> aggregating multiple pages. It has, at times, almost driven me
> insane and I don't want to go back there.

The suggestion seems to be to always aggregate pages (to handle
PAGE_SIZE < block size), and not to even worry about the fact
that it happens that the pages you are aggregating are physically
contiguous.  The memory allocator and the block layer can worry
about that.  It isn't something the page cache or filesystems
need to pay attention to.

I suspect the implementation in linux would be sufficiently different
that it would not be prone to the same problems.  Among other things
we are already do most things on a range of page addresses, so we
would seem to have most of the infrastructure already.

It looks like if we extend the current batching a little more
so it covers all of the interesting cases. (read)

Ensure the dirty bit on all pages in the group when we set it on
one page.

Add re-read when we dirty the group if we don't have it all present.

Round the range we operate on up so we cleanly hit the beginning
and end of the group size.

Only issue the mapping operations on the first page in the group.

Is about what we would have to do to handle multiple pages in one
block in the page cache.  There are clearly more details but as
a first approximation I don't see this being fundamentally more
complex then what we are currently doing.  Just taking into account a
few more details.



The whole physical continuity thing seems to come cleanly out of a
speculative page allocator, and that would seem to work and provide
improvements on smaller block sizes filesystems so it looks like
a larger general improvement.

Likewise Jens increase the linux scatter gather list size seems like 
a more general independent improvement.

So if we can also handle groups of pages that make up a single block
as a independent change we have all of the benefits of large block
sizes with most of them applying to small sector size filesystems
as well.

Given that small block sizes give us better storage efficiency,
which means less disk bandwidth used, which means less time
to get the data off of a slow disk (especially if you can
put multiple files you want simultaneously in that same space).
I'm not convinced that large block sizes are a clear disk performance
advantage, so we should not neglect the small file sizes.

Eric

  parent reply	other threads:[~2007-05-04 13:32 UTC|newest]

Thread overview: 235+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-24 22:21 [00/17] Large Blocksize Support V3 clameter
2007-04-24 22:21 ` [01/17] Remove open coded implementation of memclear_highpage flush clameter
2007-04-24 22:21 ` [02/17] Fix page allocation flags in grow_dev_page() clameter
2007-04-24 22:21 ` [03/17] Fix: find_or_create_page does not spread memory clameter
2007-04-24 22:21 ` [04/17] Free up page->private for compound pages clameter
2007-04-24 22:21 ` [05/17] More compound page features clameter
2007-04-24 22:21 ` [06/17] Fix up handling of Compound head pages clameter
2007-04-24 22:21 ` [07/17] vmstat.c: Support accounting for compound pages clameter
2007-04-24 22:21 ` [08/17] Define functions for page cache handling clameter
2007-04-24 23:00   ` Eric Dumazet
2007-04-25  6:27     ` Christoph Lameter
2007-04-24 22:21 ` [09/17] Convert PAGE_CACHE_xxx -> page_cache_xxx function calls clameter
2007-04-24 22:21 ` [10/17] Variable Order Page Cache: Add clearing and flushing function clameter
2007-04-26  7:02   ` Christoph Lameter
2007-04-26  8:14     ` David Chinner
2007-04-24 22:21 ` [11/17] Readahead support for the variable order page cache clameter
2007-04-24 22:21 ` [12/17] Variable Page Cache Size: Fix up reclaim counters clameter
2007-04-24 22:21 ` [13/17] set_blocksize: Allow to set a larger block size than PAGE_SIZE clameter
2007-04-24 22:21 ` [14/17] Add VM_BUG_ONs to check for correct page order clameter
2007-04-24 22:21 ` [15/17] ramfs: Variable order page cache support clameter
2007-04-24 22:21 ` [16/17] ext2: " clameter
2007-04-24 22:21 ` [17/17] xfs: " clameter
2007-04-25  0:46 ` [00/17] Large Blocksize Support V3 Jörn Engel
2007-04-25  0:47 ` H. Peter Anvin
2007-04-25  3:11 ` William Lee Irwin III
2007-04-25 11:35 ` Jens Axboe
2007-04-25 15:36   ` Christoph Lameter
2007-04-25 17:53     ` Jens Axboe
2007-04-25 18:03       ` Christoph Lameter
2007-04-25 18:05         ` Jens Axboe
2007-04-25 18:14           ` Christoph Lameter
2007-04-25 18:16             ` Jens Axboe
2007-04-25 13:28 ` Mel Gorman
2007-04-25 15:23   ` Christoph Lameter
2007-04-25 22:46 ` Badari Pulavarty
2007-04-26  1:14   ` David Chinner
2007-04-26  1:17     ` David Chinner
2007-04-26  4:51 ` Eric W. Biederman
2007-04-26  5:05   ` Christoph Lameter
2007-04-26  5:44     ` Eric W. Biederman
2007-04-26  6:37       ` Christoph Lameter
2007-04-26  9:16         ` Mel Gorman
2007-04-26  6:38       ` Nick Piggin
2007-04-26  6:46         ` Christoph Lameter
2007-04-26  6:57           ` Nick Piggin
2007-04-26  7:10             ` Christoph Lameter
2007-04-26  7:22               ` Nick Piggin
2007-04-26  7:34                 ` Christoph Lameter
2007-04-26  7:48                   ` Nick Piggin
2007-04-26  9:20                     ` David Chinner
2007-04-26 13:53                       ` Avi Kivity
2007-04-26 14:33                         ` David Chinner
2007-04-26 14:56                           ` Avi Kivity
2007-04-26 15:20                       ` Nick Piggin
2007-04-26 17:42                         ` Jens Axboe
2007-04-26 18:59                           ` Eric W. Biederman
2007-04-26 16:07                     ` Christoph Hellwig
2007-04-27 10:05                       ` Nick Piggin
2007-04-27 13:06                         ` Mel Gorman
2007-04-26 13:50                   ` William Lee Irwin III
2007-04-26 18:09                     ` Eric W. Biederman
2007-04-26 23:34                       ` William Lee Irwin III
2007-04-26  7:48                 ` Questions on printk and console_drivers gshan
2007-04-26 10:06           ` [00/17] Large Blocksize Support V3 Mel Gorman
2007-04-26 14:47             ` Nick Piggin
2007-04-26 15:58         ` Christoph Hellwig
2007-04-26 16:05           ` Jens Axboe
2007-04-26 16:16             ` Christoph Hellwig
2007-04-26 13:28       ` Alan Cox
2007-04-26 13:30         ` Jens Axboe
2007-04-29 14:12         ` Matt Mackall
2007-04-28 10:55       ` Pierre Ossman
2007-04-28 15:39         ` Eric W. Biederman
2007-04-26  5:37   ` Nick Piggin
2007-04-26  6:38     ` David Chinner
2007-04-26  6:50       ` Nick Piggin
2007-04-26  8:40         ` Mel Gorman
2007-04-26  8:55           ` Nick Piggin
2007-04-26 10:30             ` Mel Gorman
2007-04-26 10:54               ` Eric W. Biederman
2007-04-26 12:23                 ` Mel Gorman
2007-04-26 17:58                 ` Christoph Lameter
2007-04-26 18:02                   ` Jens Axboe
2007-04-26 16:11         ` Christoph Hellwig
2007-04-26 17:49           ` Eric W. Biederman
2007-04-26 18:03             ` Christoph Lameter
2007-04-26 18:03               ` Jens Axboe
2007-04-26 18:09                 ` Christoph Hellwig
2007-04-26 18:12                   ` Jens Axboe
2007-04-26 18:24                     ` Christoph Hellwig
2007-04-26 18:24                       ` Jens Axboe
2007-04-26 18:28                     ` Christoph Lameter
2007-04-26 18:29                       ` Jens Axboe
2007-04-26 18:35                         ` Christoph Lameter
2007-04-26 18:39                           ` Jens Axboe
2007-04-26 19:35                             ` Eric W. Biederman
2007-04-26 19:42                               ` Jens Axboe
2007-04-27  4:05                                 ` Eric W. Biederman
2007-04-27 10:26                                   ` Nick Piggin
2007-04-27 13:51                                     ` Eric W. Biederman
2007-04-26 20:22                             ` Mel Gorman
2007-04-27  0:21                               ` William Lee Irwin III
2007-04-27  5:16                               ` Jens Axboe
2007-04-27 10:38           ` Nick Piggin
2007-04-26 10:10       ` Eric W. Biederman
2007-04-26 13:50         ` David Chinner
2007-04-26 14:40           ` William Lee Irwin III
2007-04-26 15:38           ` Nick Piggin
2007-04-26 15:58             ` William Lee Irwin III
2007-04-27  9:46               ` Nick Piggin
2007-04-27  0:19           ` Jeremy Higdon
2007-04-26 18:07         ` Christoph Lameter
2007-04-26 18:45           ` Eric W. Biederman
2007-04-26 18:59             ` Christoph Lameter
2007-04-26 19:21               ` Eric W. Biederman
2007-04-26  6:40     ` Christoph Lameter
2007-04-26  6:53       ` Nick Piggin
2007-04-26  7:04         ` David Chinner
2007-04-26  7:07           ` Nick Piggin
2007-04-26  7:11             ` Christoph Lameter
2007-04-26  7:17               ` Nick Piggin
2007-04-26  7:28                 ` Christoph Lameter
2007-04-26  7:45                   ` Nick Piggin
2007-04-26 18:10                     ` Christoph Lameter
2007-04-27 10:08                       ` Nick Piggin
2007-04-26  7:07         ` Christoph Lameter
2007-04-26  7:15           ` Nick Piggin
2007-04-26  7:22             ` Christoph Lameter
2007-04-26  7:42               ` Nick Piggin
2007-04-26 10:48                 ` Mel Gorman
2007-04-26 12:37                 ` Andy Whitcroft
2007-04-26 14:18                   ` David Chinner
2007-04-26 15:08                   ` Nick Piggin
2007-04-26 15:19                     ` William Lee Irwin III
2007-04-26 15:28                     ` David Chinner
2007-04-26 14:53                 ` William Lee Irwin III
2007-04-26 18:16                   ` Christoph Lameter
2007-04-26 18:21                   ` Eric W. Biederman
2007-04-27  0:32                     ` William Lee Irwin III
2007-04-27 10:22                       ` Nick Piggin
2007-04-27 12:58                         ` William Lee Irwin III
2007-04-27 13:06                           ` Nick Piggin
2007-04-27 14:49                             ` William Lee Irwin III
2007-04-26 18:13                 ` Christoph Lameter
2007-04-27 10:15                   ` Nick Piggin
2007-04-26 14:49               ` William Lee Irwin III
2007-04-26 18:50 ` Maxim Levitsky
2007-04-27  2:04 ` Andrew Morton
2007-04-27  2:27   ` David Chinner
2007-04-27  2:53     ` Andrew Morton
2007-04-27  3:47       ` [00/17] Large Blocksize Support V3 (mmap conceptual discussion) Christoph Lameter
2007-04-27  4:20       ` [00/17] Large Blocksize Support V3 David Chinner
2007-04-27  5:15         ` Andrew Morton
2007-04-27  5:49           ` Christoph Lameter
2007-04-27  6:55             ` Andrew Morton
2007-04-27  7:19               ` Christoph Lameter
2007-04-27  7:26                 ` Andrew Morton
2007-04-27  8:37                   ` David Chinner
2007-04-27 12:01                   ` Christoph Lameter
2007-04-27 16:36                   ` David Chinner
2007-04-27 17:34                     ` David Chinner
2007-04-27 19:11                       ` Andrew Morton
2007-04-28  1:43                         ` Nick Piggin
2007-04-28  8:04                           ` Peter Zijlstra
2007-04-28  8:22                             ` Andrew Morton
2007-04-28  8:32                               ` Peter Zijlstra
2007-04-28  8:55                                 ` Andrew Morton
2007-04-28  9:36                                   ` Peter Zijlstra
2007-04-28 14:09                               ` William Lee Irwin III
2007-04-28 18:26                                 ` Andrew Morton
2007-04-28 19:19                                   ` William Lee Irwin III
2007-04-28 21:28                                     ` Andrew Morton
2007-04-28  3:17                         ` David Chinner
2007-04-28  3:49                           ` Christoph Lameter
2007-04-28  4:56                           ` Andrew Morton
2007-04-28  5:08                             ` Christoph Lameter
2007-04-28  5:36                               ` Andrew Morton
2007-04-28  6:24                                 ` Christoph Lameter
2007-04-28  6:52                                   ` Andrew Morton
2007-04-30  5:30                                     ` Christoph Lameter
2007-04-28  9:43                             ` Alan Cox
2007-04-28  9:58                               ` Andrew Morton
2007-04-28 10:21                                 ` Alan Cox
2007-04-28 10:25                                   ` Andrew Morton
2007-04-28 11:29                                     ` Alan Cox
2007-04-28 14:37                                       ` William Lee Irwin III
2007-04-27  7:22               ` Christoph Lameter
2007-04-27  7:29                 ` Andrew Morton
2007-04-27  7:35                   ` Christoph Lameter
2007-04-27  7:43                     ` Andrew Morton
2007-04-27 11:05               ` Paul Mackerras
2007-04-27 11:41                 ` Nick Piggin
2007-04-27 12:12                   ` Christoph Lameter
2007-04-27 12:25                     ` Nick Piggin
2007-04-27 13:39                       ` Christoph Hellwig
2007-04-28  2:27                         ` Nick Piggin
2007-04-28  2:39                           ` William Lee Irwin III
2007-04-28  2:50                             ` Nick Piggin
2007-04-28  3:16                               ` William Lee Irwin III
2007-04-28  8:16                           ` Christoph Hellwig
2007-04-27 16:48                       ` Christoph Lameter
2007-04-27 13:37                     ` Christoph Hellwig
2007-04-27 12:14                   ` Paul Mackerras
2007-04-27 12:36                     ` Nick Piggin
2007-04-27 13:42                     ` Christoph Hellwig
2007-04-27 11:58                 ` Christoph Lameter
2007-04-27 13:44               ` William Lee Irwin III
2007-04-27 19:15                 ` Andrew Morton
2007-04-28  2:21                   ` William Lee Irwin III
2007-04-27  6:09           ` David Chinner
2007-04-27  7:04             ` Andrew Morton
2007-04-27  8:03               ` David Chinner
2007-04-27  8:48                 ` Andrew Morton
2007-04-27 16:45                   ` Theodore Tso
2007-05-04 13:33                     ` Eric W. Biederman
2007-05-07  4:29                       ` David Chinner
2007-05-07  4:48                         ` Eric W. Biederman
2007-05-07  5:27                           ` David Chinner
2007-05-07  6:43                             ` Eric W. Biederman
2007-05-07  6:49                               ` William Lee Irwin III
2007-05-07  7:06                                 ` William Lee Irwin III
2007-05-08  8:49                                   ` William Lee Irwin III
2007-05-07 16:06                               ` Christoph Lameter
2007-05-07 17:29                                 ` William Lee Irwin III
2007-05-04 12:57                   ` Eric W. Biederman
2007-05-04 13:31                 ` Eric W. Biederman [this message]
2007-05-04 16:11                   ` Christoph Lameter
2007-05-07  4:58                   ` David Chinner
2007-05-07  6:56                     ` Eric W. Biederman
2007-05-07 15:17                       ` Weigert, Daniel
2007-04-27 16:55           ` Theodore Tso
2007-04-27 17:32             ` Nicholas Miell
2007-04-27 18:12               ` William Lee Irwin III
2007-04-28 16:39 ` Maxim Levitsky
2007-04-30  5:23   ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1ejlwu346.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=dgc@sgi.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maximlevitsky@gmail.com \
    --cc=mel@skynet.ie \
    --cc=pbadari@gmail.com \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox