From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: what do you do that stresses your filesystem? Date: 06 Jan 2003 10:14:30 -0500 Message-ID: <1041866070.16279.47.camel@tiny.suse.com> References: <3E06F360.7000708@namesys.com> <002001c2aaa1$08650550$0200000a@ringo> <3E082A86.8050501@namesys.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <3E082A86.8050501@namesys.com> List-Id: Content-Type: text/plain; charset="us-ascii" To: Hans Reiser Cc: Chris Haynes , ReiserFS On Tue, 2002-12-24 at 04:36, Hans Reiser wrote: > Chris Haynes wrote: > > > >I'm about to deploy an on-line transaction-based service.. All > >service-specific software is in 1000+ Java classes. The OS is SuSE > >7.3, and Java version 1.4 You'll get better performance from the suse update kernels (or anything with the data logging patches. > > > >The heavist uses of the file store are: > > > >During development: > >+ javadoc > > > >During service deployment / update: > >+ rsync > >+ When using a tripwire-like Java program which checks the SHA digest > >of all deployable files against zipped JARs > > > >During production operation: > >+ atomic, synchronized writes to multiple files (typically 3 - 4 files > >in different directories, the first is the creation of a new 4 kb > >file, others are usually updates of existing files - growing typically > >1kb per update). This files system is mounted on a RAID-1 pair. Is this the only type of write being done? If so, mounting with data=journal will give you a pretty big boost. Regardless of which data mode you use, the io pattern you want to try for looks like this: write(file1) write(file2) write(fileX) ... fsync(fileX) fsync(fileX-1) ... fsync(file1) This allows for the biggest transaction before the fsync comes in and forces a commit. By doing an fsync on the newest file first, you greatly increase the chances that all the transactions for all the old files will be committed by the fsync(fileX). When this happens, the rest of the fsyncs just trigger writes on the data blocks. If you are using data=journal, the rest of the fsyncs become noops, since the fsync(fileX) will also commit all the data blocks of all the previous writes. If there are other types of writes going on (large non-synchronous writes), data=journal will hurt performance because it involves writing all the data blocks twice. In this case your best bet will be a dedicated logging device. If the synchronous writes for a single transaction are the only writes to the FS, and you are doing writes to many different files, you can also just do all the writes/creates, and then run sync(). This will get all the data block writes scheduled at once, and then write to the log. > >+ rsync > >+ Successive reading of all files in a directory sub-tree (up to 10M > >files) > >in filestore-defined order (i.e. the program makes no demands or > >assumptions about the order - it uses the order supplied by Java's > >File.files()). > > > > > >The greatest performance concern I have is with the file writes. As > >these are atomic transactions, I use a separate thread for each file's > >write (to give the kernel's escalator a chance to work), and require > >that the write operations be individually hardware-synchronized using > >Java's FileDescriptor.sync() method. I then use a counter to detect > >when all threads have reported that their files have been written - > >this indicating successful commitment of the transaction. > > > >I handle read-and write-locking in the application. > > > >Usually, there are no lock conflicts, so there can be many concurrent > >transaction commitments. I use a thread pool of 50 threads to handle > >the individual file writes (the 50 being a guess at the likely point > >of diminishing returns). > > > >My expectation/hope is that, so long as there are enough threads > >available in this pool, all transactions will be completed within one > >disk rotation period(regardless of the number of concurrent > >transactions or number of files per transaction and the fact that I'm > >using software RAID-1). I've not yet been able to validate this > >(theoretically or practically). It won't happen, you've got a chance in data=journal mode, but otherwise there will be seeks to and from the log area as you write and wait on the data blocks and the log blocks. > > > >I would *really* like to be able to group all the file writes for a > >transaction into a single logical API call and have the kernel/file > >system report successful completion of all data and metadata aspects > >of the transaction using a single application thread. > > The kernel doesn't have this right now, but the aio code in 2.5.x is close. -chris