From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: External journals and NVRAM devices Date: 06 Nov 2002 15:42:10 -0500 Message-ID: <1036615330.14551.719.camel@tiny> References: <20021101213703.D142A50D503@server5.fastmail.fm> <1036415792.14291.12.camel@tiny> <3DC836D2.5060100@namesys.com> <20021106201832.GR588@clusterfs.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <20021106201832.GR588@clusterfs.com> List-Id: Content-Type: text/plain; charset="us-ascii" To: Andreas Dilger Cc: reiser , JP Howard , Edward Shishkin , ReiserFS List , Oleg Drokin On Wed, 2002-11-06 at 15:18, Andreas Dilger wrote: > On Nov 05, 2002 13:23 -0800, reiser wrote: > > Chris Mason wrote: > > >With ext3, a 128M or bigger log can really improve performance because > > >so much of the writeback is done through bdflush/kupdate. > > > > Please explain the because clause of the sentence above in more detail. > > Nobody has answered this yet AFAIK, so I will. > > The reason that having a large log can help performance is because > having bdflush drive the dirty buffer writeout allows for more changes > of write merging by the elevator and such, and also avoids stalls in > user-space code as it waits for a full journal to commit transactions. > > There is a fine line here (for ext3 at least), because if you have a > large journal but it fills up before the transactions have been flushed > to the filesystem, then user apps stall while the journal is flushed > (can be several seconds). Sorry for the delay. This is why tuning bdflush with a large ext3 log to trigger writeback quickly can help. It lowers the chance userspace will have to wait for the log to flushed by decreasing the time dirty buffers are allowed to hang around. (andreas knows this better than I do, just trying to explain my last message ;-) The major difference with reiserfs (patched or not) is the log is flushed per transaction instead of trying to reclaim the whole thing. In the stock kernels, this really hurts reiserfs with small transactions, because it only flushes one transaction at a time. This means I write 3 or 4 blocks, wait, then write 3 or 4 more, wait, etc. The data logging patches have code to send more than one transaction at once, so I reclaim the log in chunks of about 200 blocks. The end result is the log wrapping around is a less expensive operation with the patches applied, and you usually won't need as large a log to make data=journal work well. The downside to my current code is that reiserfs can pin more ram (up to the size of the log) than ext3, and for a longer period of time. If you're going to disk, large logs are easy to come by. nvram is different though, so it matters more there. -chris