All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <mason@suse.com>
To: Hans Reiser <reiser@namesys.com>
Cc: Chris Haynes <chris@harvington.org.uk>,
	ReiserFS <reiserfs-list@namesys.com>
Subject: Re: what do you do that stresses your filesystem?
Date: 06 Jan 2003 10:14:30 -0500	[thread overview]
Message-ID: <1041866070.16279.47.camel@tiny.suse.com> (raw)
In-Reply-To: <3E082A86.8050501@namesys.com>

On Tue, 2002-12-24 at 04:36, Hans Reiser wrote:

> Chris Haynes wrote:
> >
> >I'm about to deploy an on-line transaction-based service.. All
> >service-specific software is in 1000+ Java classes. The OS is SuSE
> >7.3, and Java version 1.4

You'll get better performance from the suse update kernels (or anything
with the data logging patches.

> >
> >The heavist uses of the file store are:
> >
> >During development:
> >+ javadoc
> >
> >During service deployment / update:
> >+ rsync
> >+ When using a tripwire-like Java program which checks the SHA digest
> >of all deployable files against zipped JARs
> >
> >During production operation:
> >+ atomic, synchronized writes to multiple files (typically 3 - 4 files
> >in different directories, the first is the creation of a new 4 kb
> >file, others are usually updates of existing files - growing typically
> >1kb per update). This files system is mounted on a RAID-1 pair.

Is this the only type of write being done?  If so, mounting with
data=journal will give you a pretty big boost.  Regardless of which data
mode you use, the io pattern you want to try for looks like this:

write(file1)
write(file2)
write(fileX) ...
fsync(fileX)
fsync(fileX-1) ...
fsync(file1)

This allows for the biggest transaction before the fsync comes in and
forces a commit.  By doing an fsync on the newest file first, you
greatly increase the chances that all the transactions for all the old
files will be committed by the fsync(fileX).  When this happens, the
rest of the fsyncs just trigger writes on the data blocks.

If you are using data=journal, the rest of the fsyncs become noops,
since the fsync(fileX) will also commit all the data blocks of all the
previous writes.

If there are other types of writes going on (large non-synchronous
writes), data=journal will hurt performance because it involves writing
all the data blocks twice.  In this case your best bet will be a
dedicated logging device.

If the synchronous writes for a single transaction are the only writes
to the FS, and you are doing writes to many different files, you can
also just do all the writes/creates, and then run sync().

This will get all the data block writes scheduled at once, and then
write to the log.

> >+ rsync
> >+ Successive reading of all files in a directory sub-tree (up to 10M
> >files)
> >in filestore-defined order (i.e. the program makes no demands or
> >assumptions about the order - it uses the order supplied by Java's
> >File.files()).
> >
> >
> >The greatest performance concern I have is with the file writes. As
> >these are atomic transactions, I use a separate thread for each file's
> >write (to give the kernel's escalator a chance to work), and require
> >that the write operations be individually hardware-synchronized  using
> >Java's FileDescriptor.sync() method. I then use a counter to detect
> >when all threads have reported that their files have been written -
> >this indicating successful commitment of the transaction.
> >
> >I handle read-and write-locking in the application.
> >
> >Usually, there are no lock conflicts, so there can be many concurrent
> >transaction commitments. I use a thread pool of  50 threads to handle
> >the individual file writes (the 50 being a guess at the likely point
> >of diminishing returns).
> >
> >My expectation/hope is that, so long as there are enough threads
> >available in this pool, all transactions will be completed within one
> >disk rotation period(regardless of the number of concurrent
> >transactions or number of files per transaction and the fact that I'm
> >using software RAID-1). I've not yet been able to validate this
> >(theoretically or practically).

It won't happen, you've got a chance in data=journal mode, but otherwise
there will be seeks to and from the log area as you write and wait on
the data blocks and the log blocks.

> >
> >I would *really* like  to be able to group all the file writes for a
> >transaction  into a single logical  API call and have the kernel/file
> >system report successful completion of all data and metadata aspects
> >of the transaction using a single application thread.
> >

The kernel doesn't have this right now, but the aio code in 2.5.x is
close.  

-chris



  reply	other threads:[~2003-01-06 15:14 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-12-23 11:28 what do you do that stresses your filesystem? Hans Reiser
2002-12-23 11:37 ` Anders Widman
2002-12-23 11:45   ` Hans Reiser
2002-12-23 12:36     ` Ed Tomlinson
2002-12-24  9:22       ` Hans Reiser
2002-12-23 22:02   ` Dieter Nützel
2002-12-23 11:49 ` new reiserfs4 snapshots? Ookhoi
2002-12-23 12:03 ` what do you do that stresses your filesystem? Hendrik Visage
2002-12-23 12:10   ` Oleg Drokin
2002-12-23 12:12     ` Hans Reiser
2002-12-23 12:21       ` Hendrik Visage
2002-12-23 12:26         ` Oleg Drokin
2002-12-23 12:37           ` Hendrik Visage
2002-12-23 14:18             ` Oleg Drokin
2002-12-23 14:37       ` Eric Whiting
2002-12-23 13:21 ` Russell Coker
2002-12-23 16:00   ` Alexander G. M. Smith
2002-12-24  9:26   ` Hans Reiser
2002-12-24 10:15     ` Ookhoi
2002-12-24 10:19       ` Oleg Drokin
2002-12-24 10:26         ` Ookhoi
2002-12-24 11:57       ` Hans Reiser
2002-12-24 12:41     ` Russell Coker
2002-12-23 14:44 ` bscott
2002-12-23 15:56 ` Ross Vandegrift
2002-12-23 20:12   ` Russell Coker
2002-12-23 16:33 ` Chris Haynes
2002-12-24  9:36   ` Hans Reiser
2003-01-06 15:14     ` Chris Mason [this message]
2002-12-23 18:54 ` Matthew Johnson
2002-12-23 21:04 ` Manuel Krause
2002-12-23 21:14 ` Andrew Clausen
2002-12-28  5:27 ` Brian Tinsley
2003-01-05  8:17   ` Hans Reiser
2003-01-05 11:49     ` Legato (was: " Hendrik Visage
2003-01-05 17:00       ` Brian Tinsley
2003-01-06  7:00       ` Hans Reiser
2003-01-05 16:51     ` Brian Tinsley
2003-01-06  7:10       ` Hans Reiser
2003-01-05 15:01 ` Philipp Gühring
2003-01-11 15:25 ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1041866070.16279.47.camel@tiny.suse.com \
    --to=mason@suse.com \
    --cc=chris@harvington.org.uk \
    --cc=reiser@namesys.com \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.