Re: 64-bit block sizes on 32-bit systems

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: LA Walsh <law@sgi.com>
To: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Jesse Pollard <pollard@tomcat.admin.navo.hpc.mil>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: 64-bit block sizes on 32-bit systems
Date: Tue, 27 Mar 2001 13:55:48 -0800	[thread overview]
Message-ID: <3AC10C64.108BCFD3@sgi.com> (raw)
In-Reply-To: <200103271957.NAA13547@tomcat.admin.navo.hpc.mil> <20010327152011.A1354@cs.cmu.edu>

Jan Harkes wrote:
> 
> On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote:
> > > Using similar numbers as presented. If we are working our way through
> > > every single block in a Pentabyte filesystem, and the blocksize is 512
> > > bytes. Then the 1us in extra CPU cycles because of 64-bit operations
> > > would add, according to by back of the envelope calculation, 2199023
> > > seconds of CPU time a bit more than 25 days.
> >
> > Ummm... I don't think it adds that much. You seem to be leaving out the
> > overlap disk/IO and computation for read-ahead. This should eliminate the
> > majority of the delay effect.
> 
> 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us)
> of "assumed" overhead per block operation is 2*10^6 seconds, no I
> believe I'm pretty close there. I am considering everything being
> "available in the cache", i.e. no waiting for disk access.
---
	If everything being used is only used from the cache, then
the application probably doesn't need 64-bit block support.  

	I submit that your argument may be flawed in the assumption that
if an application needs multi-terabyte files and devices, that most
of the data will be in the in-memory cache. 
 

> The time to update the pagetables is identical to the time to update a
> 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more
> time to load the data into the page, however it should be a consecutive
> stretch of data on disk, which should give a more efficient transfer
> than small blocks scattered around the disk.
---
	Not if you were doing alot of random reads where you only
needd 1-2K of data.  The read-time of the extra 2M-1K would seem
to eat into any performance boot gained by the large pagesize.

> 
> > Granted, 512 bytes could be considered too small for some things, but
> > once you pass 32K you start adding a lot of rotational delay problems.
> > I've used file systems with 256K blocks - they are slow when compaired
> > to the throughput using 32K. I wasn't the one running the benchmarks,
> > but with a MaxStrat 400GB raid with 256K sized data transfer was much
> > slower (around 3 times slower) than 32K. (The target application was
> > a GIS server using Oracle).
> 
> But your subsystem (the disk) was probably still using 512 byte blocks,
> possibly scattered. And the OS was still using 4KB pages, it takes more
> time to reclaim and gather 64 pages per IO operation than one, that's
> why I'm saying that the pagesize needs to scale along with the blocksize.
> 
> The application might have been assuming a small block size as well, and
> the OS was told to do several read/modify/write cycles, perhaps even 512
> times as much as necessary.
> 
> I'm not saying that the current system will perform well when working
> with large blocks, but compared to increasing the size of block_t, a
> larger blocksize has more potential to give improvements in the long
> term without adding an unrecoverable performance hit.
---
	That's totally application dependent.  Database applications
might tend to skip around in the data and do short/reads/writes over
a very large file.  Large block sizes will degrade their performance.

	This was the idea of making it a *configurable* option.  If
you need it, configure it.  Same with block size -- that should
likely have a wider range for configuration as well.  But
configuration (and ideally auto-configuration where possible)
seems the ultimate win-win situation.

-l
-- 
The above thoughts are my own and do not necessarily represent those
		of my employer.
L A Walsh                        | Trust Technology, Core Linux, SGI
law@sgi.com                      | Voice: (650) 933-5338

next prev parent reply	other threads:[~2001-03-27 21:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-03-27 19:57 64-bit block sizes on 32-bit systems Jesse Pollard
2001-03-27 20:20 ` Jan Harkes
2001-03-27 21:55   ` LA Walsh [this message]
  -- strict thread matches above, loose matches on Subject: below --
2001-03-27 22:23 Jesse Pollard
2001-03-27 23:56 ` Steve Lord
2001-03-28  8:09   ` Brad Boyer
2001-03-28 14:53     ` Dave Kleikamp
2001-03-27 19:30 Jesse Pollard
     [not found] <Pine.LNX.4.30.0103270022500.21075-100000@age.cs.columbia.edu>
     [not found] ` <3AC0CA9C.3D804361@sgi.com>
2001-03-27 19:00   ` Jan Harkes
2001-03-27 17:22 LA Walsh
2001-03-26 21:27 Jesse Pollard
2001-03-26 22:07 ` Jonathan Morton
2001-03-27  4:14   ` Jesse Pollard
2001-03-26 19:26 Jesse Pollard
2001-03-26 18:01 Manfred Spraul
2001-03-26 18:07 ` Matthew Wilcox
2001-03-26 19:40 ` LA Walsh
2001-03-26 21:53   ` Manfred Spraul
2001-03-26 22:07     ` LA Walsh
2001-03-26 17:35 LA Walsh
2001-03-26 16:39 LA Walsh
2001-03-26 17:18 ` Matthew Wilcox
2001-03-26 17:47   ` Andreas Dilger
2001-03-26 18:09     ` Matthew Wilcox
2001-03-26 18:37       ` Eric W. Biederman
2001-03-26 19:36         ` Martin Dalecki
2001-03-26 23:03         ` AJ Lewis
2001-03-26 19:05       ` Scott Laird
2001-03-26 19:09       ` Andreas Dilger
2001-03-26 20:31         ` Dan Hollis
2001-03-26 19:20       ` Rik van Riel
2001-03-26 20:14       ` Jes Sorensen
2001-03-26 17:58 ` Eric W. Biederman
2001-03-28  8:06 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3AC10C64.108BCFD3@sgi.com \
    --to=law@sgi.com \
    --cc=jaharkes@cs.cmu.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pollard@tomcat.admin.navo.hpc.mil \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox