public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jesse Pollard <pollard@tomcat.admin.navo.hpc.mil>
To: jaharkes@cs.cmu.edu, LA Walsh <law@sgi.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: 64-bit block sizes on 32-bit systems
Date: Tue, 27 Mar 2001 13:57:42 -0600 (CST)	[thread overview]
Message-ID: <200103271957.NAA13547@tomcat.admin.navo.hpc.mil> (raw)

---------  Received message begins Here  ---------

> 
> On Tue, Mar 27, 2001 at 09:15:08AM -0800, LA Walsh wrote:
> > 	Now lets look at the sites want to process terabytes of
> > data -- perhaps files systems up into the Pentabyte range.  Often I
> > can see these being large multi-node (think 16-1024 clusters as 
> > are in use today for large super-clusters).  If I was to characterize
> > the performance of them, I'd likely see the CPU pegged at 100% 
> > with 99% usage in user space.  Let's assume that increasing the
> > block size decreases disk accesses by as much as 10% (you'll have
> > to admit -- using a 64bit quantity vs. 32bit quantity isn't going
> > to even come close to increasing disk access times by 1 millisecond,
> > really, so it really is going to be a much smaller fraction when
> > compared to the actual disk latency).  
> [snip]
> > 	Is there some logical flaw in the above reasoning?
> 
> But those changes will affect even the fastpath, i.e. data that is
> already in the page/buffer caches. In which case we don't have to wait
> for disk access latency. Why would anyone who is working with a
> pentabyte of data even consider not relying on essentially always
> hitting data that is available the read-ahead cache.

It depends entirely on the application. Where the cache can contain
20% of the data, most accesses should already be in memory. If the
data is significantly larger, there is a high chance that the data
will not be there.

> 
> Using similar numbers as presented. If we are working our way through
> every single block in a Pentabyte filesystem, and the blocksize is 512
> bytes. Then the 1us in extra CPU cycles because of 64-bit operations
> would add, according to by back of the envelope calculation, 2199023
> seconds of CPU time a bit more than 25 days.

Ummm... I don't think it adds that much. You seem to be leaving out the
overlap disk/IO and computation for read-ahead. This should eliminate the
majority of the delay effect.

> Seriously, there is a lot more that needs to be done than introducing a
> 64-bit blocknumber. Effectively 512 byte blocks are far too small for
> that kind of data, and going to pagesize blocks (and increasing pagesize
> to 64KB or 2MB at the same time) is a solution that is far more likely
> to give good results since it reduces both the total the number of
> 'blocks' on the device as well as reducing the total amount of calls
> throughout kernel space instead of increasing the cost per call.

Talk about adding overhead... How long do you think it takes to read a
2MB block (not to mention the time to update that page..) The additional
contention on the fiberchannel I/O alone might kill it if the filesystem
is busy.

Granted, 512 bytes could be considered too small for some things, but
once you pass 32K you start adding a lot of rotational delay problems.
I've used file systems with 256K blocks - they are slow when compaired
to the throughput using 32K. I wasn't the one running the benchmarks,
but with a MaxStrat 400GB raid with 256K sized data transfer was much
slower (around 3 times slower) than 32K. (The target application was
a GIS server using Oracle).

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

             reply	other threads:[~2001-03-27 19:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-03-27 19:57 Jesse Pollard [this message]
2001-03-27 20:20 ` 64-bit block sizes on 32-bit systems Jan Harkes
2001-03-27 21:55   ` LA Walsh
  -- strict thread matches above, loose matches on Subject: below --
2001-03-27 22:23 Jesse Pollard
2001-03-27 23:56 ` Steve Lord
2001-03-28  8:09   ` Brad Boyer
2001-03-28 14:53     ` Dave Kleikamp
2001-03-27 19:30 Jesse Pollard
     [not found] <Pine.LNX.4.30.0103270022500.21075-100000@age.cs.columbia.edu>
     [not found] ` <3AC0CA9C.3D804361@sgi.com>
2001-03-27 19:00   ` Jan Harkes
2001-03-27 17:22 LA Walsh
2001-03-26 21:27 Jesse Pollard
2001-03-26 22:07 ` Jonathan Morton
2001-03-27  4:14   ` Jesse Pollard
2001-03-26 19:26 Jesse Pollard
2001-03-26 18:01 Manfred Spraul
2001-03-26 18:07 ` Matthew Wilcox
2001-03-26 19:40 ` LA Walsh
2001-03-26 21:53   ` Manfred Spraul
2001-03-26 22:07     ` LA Walsh
2001-03-26 17:35 LA Walsh
2001-03-26 16:39 LA Walsh
2001-03-26 17:18 ` Matthew Wilcox
2001-03-26 17:47   ` Andreas Dilger
2001-03-26 18:09     ` Matthew Wilcox
2001-03-26 18:37       ` Eric W. Biederman
2001-03-26 19:36         ` Martin Dalecki
2001-03-26 23:03         ` AJ Lewis
2001-03-26 19:05       ` Scott Laird
2001-03-26 19:09       ` Andreas Dilger
2001-03-26 20:31         ` Dan Hollis
2001-03-26 19:20       ` Rik van Riel
2001-03-26 20:14       ` Jes Sorensen
2001-03-26 17:58 ` Eric W. Biederman
2001-03-28  8:06 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200103271957.NAA13547@tomcat.admin.navo.hpc.mil \
    --to=pollard@tomcat.admin.navo.hpc.mil \
    --cc=jaharkes@cs.cmu.edu \
    --cc=law@sgi.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox