public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: Al Boldi <a1426z@gawab.com>, Andreas Dilger <adilger@sun.com>,
	Chris Snook <csnook@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode
Date: Thu, 31 Jan 2008 12:14:54 -0500	[thread overview]
Message-ID: <200801311214.55287.chris.mason@oracle.com> (raw)
In-Reply-To: <20080131171040.GL1461@duck.suse.cz>

On Thursday 31 January 2008, Jan Kara wrote:
> On Thu 31-01-08 11:56:01, Chris Mason wrote:
> > On Thursday 31 January 2008, Al Boldi wrote:
> > > Andreas Dilger wrote:
> > > > On Wednesday 30 January 2008, Al Boldi wrote:
> > > > > And, a quick test of successive 1sec delayed syncs shows no hangs
> > > > > until about 1 minute (~180mb) of db-writeout activity, when the
> > > > > sync abruptly hangs for minutes on end, and io-wait shows almost
> > > > > 100%.
> > > >
> > > > How large is the journal in this filesystem?  You can check via
> > > > "debugfs -R 'stat <8>' /dev/XXX".
> > >
> > > 32mb.
> > >
> > > > Is this affected by increasing
> > > > the journal size?  You can set the journal size via "mke2fs -J
> > > > size=400" at format time, or on an unmounted filesystem by running
> > > > "tune2fs -O ^has_journal /dev/XXX" then "tune2fs -J size=400
> > > > /dev/XXX".
> > >
> > > Setting size=400 doesn't help, nor does size=4.
> > >
> > > > I suspect that the stall is caused by the journal filling up, and
> > > > then waiting while the entire journal is checkpointed back to the
> > > > filesystem before the next transaction can start.
> > > >
> > > > It is possible to improve this behaviour in JBD by reducing the
> > > > amount of space that is cleared if the journal becomes "full", and
> > > > also doing journal checkpointing before it becomes full.  While that
> > > > may reduce performance a small amount, it would help avoid such huge
> > > > latency problems. I believe we have such a patch in one of the Lustre
> > > > branches already, and while I'm not sure what kernel it is for the
> > > > JBD code rarely changes much....
> > >
> > > The big difference between ordered and writeback is that once the
> > > slowdown starts, ordered goes into ~100% iowait, whereas writeback
> > > continues 100% user.
> >
> > Does data=ordered write buffers in the order they were dirtied?  This
> > might explain the extreme problems in transactional workloads.
>
>   Well, it does but we submit them to block layer all at once so elevator
> should sort the requests for us...

nr_requests is fairly small, so a long stream of random requests should still 
end up being random IO.

Al, could you please compare the write throughput from vmstat for the 
data=ordered vs data=writeback runs?  I would guess the data=ordered one has 
a lower overall write throughput.

-chris

  reply	other threads:[~2008-01-31 17:16 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-24 20:36 [RFC] ext3: per-process soft-syncing data=ordered mode Al Boldi
2008-01-24 21:50 ` Diego Calleja
2008-01-26  5:27   ` Al Boldi
2008-01-28 17:34     ` Jan Kara
2008-01-24 21:58 ` Valdis.Kletnieks
2008-01-26  5:27   ` Al Boldi
2008-01-25  1:19 ` Chris Snook
2008-01-26  5:28   ` Al Boldi
2008-01-29 17:22     ` Jan Kara
2008-01-30  6:04       ` Al Boldi
2008-01-30 14:29         ` Chris Mason
2008-01-30 18:39           ` Al Boldi
2008-01-31  0:32           ` Andreas Dilger
2008-01-31  6:20             ` Al Boldi
2008-01-31 16:56               ` Chris Mason
2008-01-31 17:10                 ` Jan Kara
2008-01-31 17:14                   ` Chris Mason [this message]
2008-02-01 21:26                     ` Al Boldi
2008-02-04 17:54                       ` Jan Kara
2008-02-05  7:07                         ` Al Boldi
2008-02-05 15:07                           ` Jan Kara
2008-02-05 19:20                             ` Al Boldi
2008-01-25  6:47 ` Andreas Dilger
2008-01-25 21:57   ` david
2008-01-25 15:36 ` Jan Kara
2008-01-26  5:27   ` Al Boldi
2008-01-28 17:27     ` Jan Kara
2008-01-28 20:17       ` Al Boldi
2008-02-07  0:00     ` Andreas Dilger
2008-02-10 14:54       ` Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801311214.55287.chris.mason@oracle.com \
    --to=chris.mason@oracle.com \
    --cc=a1426z@gawab.com \
    --cc=adilger@sun.com \
    --cc=csnook@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox